1. Initial Preparations

1.1 Load all the Necessary Packages for the Subsequent Data Analysis

The R environment is initialised by loading all the essential packages required for the upcoming statistical analysis. This configuration ensures that all the necessary functions and libraries are available, facilitating the efficient manipulation, analysis and visualisation of data.

# List of packages to be updated or installed
my.packages <- c("dplyr", "tidyverse", 
                 "ggplot2", "readr",
                 "MASS", "texreg", "regions", 
                 "arrow","eurostat", "openxlsx", 
                 "haven", "fmtr", 
                 "readxl","tidyr", "foreign", "car", 
                 "brant", "lmtest", "knitr","writexl", 
                 "tidyr", "lme4", "regions", 
                 "haven", "nuts", 
                 "viridis", "RColorBrewer", 
                 "sf", "kableExtra", 
                 "psych", "ltm", "reshape2", "stargazer", 
                 "sf", "tmap", "MASS", "glmmTMB", "sjPlot", 
                 "effects", "scales", "ggeffects")

# Load all the packages for the subsequent data analysis 
lapply(my.packages, library, 
       character.only = TRUE) 

# Remove the "my.packages" from the environment 
remove(my.packages) 

1.2 Loading Replicate Datasets from Previous Quantitative Analyses


# Load the necessary datasets
example1 <- read_dta("00_Replication Quelle 25/replication_data_JEG.dta")
example2 <- read_dta("00_Replication Quelle 28/RJPP_1314534_supp2.dta")
example3 <- read_dta("00_Replication Quelle 01/Data for replication-1.dta")

# Load and preprocess the fourth replication dataset
example4 <- read_dta("00_Replication Quelle 59/Dataset 1 JPP.dta") %>%
  mutate(
    # Map numeric country codes to their corresponding ISO country abbreviations
    cntry = case_when(
      cntry_n == 26 ~ "SI",   # Slovenia
      cntry_n == 25 ~ "SE",   # Sweden
      cntry_n == 20 ~ "NO",   # Norway
      cntry_n == 7  ~ "DK",   # Denmark
      cntry_n == 5  ~ "CZ",   # Czech Republic
      cntry_n == 15 ~ "HU",   # Hungary
      cntry_n == 10 ~ "FI",   # Finland
      cntry_n == 1  ~ "BE",   # Belgium
      cntry_n == 11 ~ "FR",   # France
      cntry_n == 3  ~ "CH",   # Switzerland
      cntry_n == 6  ~ "DE",   # Germany
      cntry_n == 16 ~ "IE",   # Ireland
      cntry_n == 9  ~ "ES",   # Spain
      cntry_n == 12 ~ "GB",   # United Kingdom
      cntry_n == 8  ~ "EE",   # Estonia
      cntry_n == 13 ~ "GR",   # Greece
      cntry_n == 18 ~ "LT",   # Lithuania
      cntry_n == 22 ~ "PT",   # Portugal
      cntry_n == 19 ~ "NL",   # Netherlands
      cntry_n == 21 ~ "PL",   # Poland
      cntry_n == 23 ~ "RO",   # Romania
      TRUE ~ as.character(cntry_n)  # Default case for unexpected values
    )
  )

2. Loading the ESS Data

The ESS dataset can be downloaded from the following website: https://www.europeansocialsurvey.org/data-portal

Loading the Dataset: The ESS dataset is loaded from the .dta file into the ess variable using read_dta.

Column Selection: Unnecessary columns (e.g., name, edition, proddate) are removed with dplyr::select to retain only relevant variables for analysis.

# Load the European Social Survey (ESS) dataset
ess <- read_dta("Datasets/ESS_data.dta")

# Select relevant columns and remove unnecessary ones
ess <- ess %>%
  dplyr::select(
    -name,       # Remove variable 'name' which is not needed
    -edition,    # Remove variable 'edition' which is not needed
    -proddate,   # Remove variable 'proddate' which is not needed
    -idno,       # Remove variable 'idno' which is not needed
    -pweight,    # Remove variable 'pweight' which is not needed
    -atchctr,    # Remove variable 'atchctr' which is not needed
    -anweight,   # Remove variable 'anweight' which is not needed
    -prob,       # Remove variable 'prob' which is not needed
    -psu,        # Remove variable 'psu' which is not needed
    -stratum,    # Remove variable 'stratum' which is not needed
    -pdwrkp,     # Remove variable 'pdwrkp' which is not needed
    -nacer1,     # Remove variable 'nacer1' which is not needed
    -nacer11,    # Remove variable 'nacer11' which is not needed
    -nacer2,     # Remove variable 'nacer2' which is not needed
    -ipeqopta,   # Remove variable 'ipeqopta' which is not needed
    -implvdm     # Remove variable 'implvdm' which is not needed
  )

3. Cleaning the ESS Dataset

This R script performs a series of data preparation and transformation steps on the ESS dataset. Key operations include:

Filtering Data:

  • Only native citizens are retained for analysis.

  • Data for ESS Round 11, due to significant missing data, is excluded.

  • The dataset is restricted to countries that were EU members during the relevant ESS rounds.

Variable Creation and Transformation:

  • ESS Year: A new variable ess_year is created to represent the survey year based on the essround variable.

  • Country Names: A name_country variable is added for more descriptive country names.

  • Handling Missing Values: Several variables (e.g.gincdif) are recoded to handle missing values and inconsistent entries.

  • Rescaling Variables: Certain variables (e.g., gincdif) are rescaled to facilitate their interpretation.

  • Dummy Variables: New dummy variables are created for gender, employment status, education, and other categories.

Education Variables:

  • edulvla and edulvlb are processed to create a unified education level variable educ.

Activity Status:

  • Dummy variables are created for different activity statuses (e.g., unemployed, employed, retired).

Welfare Attitudes Index:

  • Several variables related to welfare attitudes (e.g., gvcldcr, sbeqsoc) are processed, rescaled, and standardised to later create an index reflecting attitudes towards welfare.
# Filter the dataset to include only individuals who are citizens or natives using variable ctzcntr 
ess <- ess[ess$ctzcntr == 1, ]

# Create a variable for the corresponding ESS survey year
ess1 <- ess %>% mutate(ess_year = case_when(
  essround == "1" ~ "2002", 
  essround == "2" ~ "2004", 
  essround == "3" ~ "2006", 
  essround == "4" ~ "2008", 
  essround == "5" ~ "2010", 
  essround == "6" ~ "2012", 
  essround == "7" ~ "2014", 
  essround == "8" ~ "2016",
  essround == "9" ~ "2018",
  essround == "10" ~ "2020",
  essround == "11" ~ "2023",
  ))

# Remove the original ESS dataset from the environment to save memory
remove(ess)

# Exclude data from ESS round 11 due to extensive missing data
ess1 <- subset(ess1, 
               essround != 11)

# Convert 'ess_year' from character to numeric 
ess1$ess_year <- as.numeric(ess1$ess_year)

# Create a column for country names based on country codes
ess1 <- ess1 %>% mutate(name_country = case_when(
  cntry == "AT" ~ "Austria",
  cntry == "BE" ~ "Belgium",
  cntry == "BG" ~ "Bulgaria",
  cntry == "HR" ~ "Croatia",
  cntry == "CY" ~ "Cyprus",
  cntry == "CH" ~ "Switzerland",
  cntry == "CZ" ~ "Czech Republic",
  cntry == "DK" ~ "Denmark",
  cntry == "EE" ~ "Estonia",
  cntry == "FI" ~ "Finland",
  cntry == "FR" ~ "France",
  cntry == "DE" ~ "Germany",
  cntry == "GR" ~ "Greece",
  cntry == "HU" ~ "Hungary",
  cntry == "IE" ~ "Ireland",
  cntry == "IT" ~ "Italy",
  cntry == "IS" ~ "Iceland",
  cntry == "LV" ~ "Latvia",
  cntry == "LT" ~ "Lithuania",
  cntry == "NO" ~ "Norway",
  cntry == "LU" ~ "Luxembourg",
  cntry == "NL" ~ "Netherlands",
  cntry == "PL" ~ "Poland",
  cntry == "PT" ~ "Portugal",
  cntry == "RO" ~ "Romania",
  cntry == "SK" ~ "Slovakia",
  cntry == "SI" ~ "Slovenia",
  cntry == "ES" ~ "Spain",
  cntry == "SE" ~ "Sweden",
  cntry == "GB" ~ "United Kingdom",
  TRUE ~ "Other"
))

# Recode the 'gincdif' variable to handle missing values 
# by converting specific codes (7, 8, 9) to NA and rescale the values
ess1 <- ess1 %>%
  mutate(gincdif = case_when(
    gincdif %in% c(7, 8, 9) ~ NA_real_,
    TRUE ~ gincdif
  ))

ess1$gincdif_rescaled <- ess1$gincdif
ess1$gincdif_rescaled <- 6 - ess1$gincdif_rescaled
attributes(ess1$gincdif_rescaled) <- attributes(ess1$gincdif)
ess1 <- subset(ess1, select = -gincdif)

# Recode the 'euftf' variable to handle missing values 
# by converting specific codes (77, 88, 99) to NA 
ess1 <- ess1 %>%
  mutate(euftf = case_when(
    euftf %in% c(77, 88, 99) ~ NA_real_,
    TRUE ~ euftf
  ))

# Recode the 'lrscale' variable to handle missing values 
# by converting specific codes (77, 88, 99) to NA 
ess1 <- ess1 %>%
  mutate(lrscale = case_when(
    lrscale %in% c(77, 88, 99) ~ NA_real_,
    TRUE ~ lrscale
))


# Recode the 'trstprl' variable to handle missing values 
# by converting specific codes (77, 88, 99) to NA 
ess1 <- ess1 %>%
  mutate(trstprl = case_when(
    trstprl %in% c(77, 88, 99) ~ NA_real_,
    TRUE ~ trstprl
  ))

# Recode the 'gndr' variable to handle missing values 
# by converting specific code (9) to NA 
# Create a bunary variable so that 1 (male) is 0 and 2 (female) is 1 (dummy)
ess1 <- ess1 %>%
  mutate(gndr = case_when(
    gndr %in% c(9) ~ NA_real_,
    TRUE ~ gndr
    ))

ess1$gndr_dummyfemale <- ifelse(ess1$gndr == 1, 0, 1)

attributes(ess1$gndr_dummyfemale) <- attributes(ess1$gndr)

ess1 <- subset(ess1, select = -gndr)

# Recode the 'agea' variable to handle missing values 
# by converting specific code (999) to NA 
ess1 <- ess1 %>%
  mutate(agea = case_when(
    agea %in% c(999) ~ NA_real_,
    TRUE ~ agea
  ))

# Recode the 'yrbrn' variable to handle missing values 
# by converting specific code (7777, 8888, 9999) to NA 
ess1 <- ess1 %>%
  mutate(yrbrn = case_when(
    yrbrn %in% c(7777, 8888, 9999) ~ NA_real_,
    TRUE ~ yrbrn
  ))

# Recode the 'domicil' variable to handle missing values 
# by converting specific code (7, 8, 9) to NA and create a dummy variable for urban residence (=1)
ess1 <- ess1 %>%
  mutate(domicil = case_when(
    domicil %in% c(7, 8, 9) ~ NA_real_,
    TRUE ~ domicil
  ))

ess1$urban_dummy <- ifelse(ess1$domicil == 1, 1, 0)


# Recode the 'lknemny' variable to handle missing values 
# by converting specific code (7, 8, 9) to NA 
ess1 <- ess1 %>%
  mutate(lknemny = case_when(
    lknemny %in% c(7, 8, 9) ~ NA_real_,
    TRUE ~ lknemny
    ))


# Recode 'edulvla' and 'edulvlb' education variables to handle 
# missing values and create a unified education variable 'educ'.

# For 'edulvla', special codes 55, 77, 88, and 99 indicate missing
# values and are recoded as NA.
ess1 <- ess1 %>%
  mutate(edulvla = case_when(
    edulvla %in% c(55, 77, 88, 99) ~ NA_real_,
    TRUE ~ edulvla
  ))

# Similarly, for 'edulvlb', recode special codes 5555, 7777, 8888, and 9999 as NA.
ess1 <- ess1 %>%
  mutate(edulvlb = case_when(
    edulvlb %in% c(5555, 7777, 8888, 9999) ~ NA_real_,
    TRUE ~ edulvlb
  ))

# Initialise a new variable 'educ' to hold a combined education level based on 'essround'.
ess1$educ <- NA

# Assign 'educ' the value of 'edulvla' for rounds 1 through 4.
ess1 <- ess1 %>%
  mutate(educ = if_else(essround %in% c(1, 2, 3, 4), edulvla, educ))

# Recode specific values of 'edulvlb' to consistent education levels, ranging from 0 to 5.
# These recodes correspond to defined education levels across survey rounds.
ess1 <- ess1 %>%
  mutate(edulvlb = case_when(
    edulvlb == 0 ~ 0, 
    edulvlb == 113 ~ 1, 
    edulvlb %in% c(129, 212, 213, 221, 222, 223, 229) ~ 2, 
    edulvlb %in% c(311, 312, 313, 321, 322, 323) ~ 3, 
    edulvlb %in% c(412, 413, 421, 422, 423) ~ 4, 
    edulvlb %in% c(510, 520, 610, 620, 710, 720, 800) ~ 5
  ))

# Assign 'educ' the value of 'edulvlb' for rounds 5 through 11.
ess1 <- ess1 %>%
  mutate(educ = if_else(essround %in% c(5, 6, 7, 8, 9, 10, 11),
                        edulvlb, educ))

# Remove the original 'edulvla' and 'edulvlb' variables as they are now consolidated into 'educ'.
ess1 <- subset(ess1, select = -edulvla)
ess1 <- subset(ess1, select = -edulvlb)

# Handle missing values in education-related variables:
# Replace codes indicating missing data in 'eduyrs' (years of education completed) with NA.
ess1 <- ess1 %>%
  mutate(eduyrs = case_when(
    eduyrs %in% c(77, 88, 99) ~ NA_real_,
    TRUE ~ eduyrs
    ))

# Recode missing values for 'mnactic' (main activity in the last 7 days).
# Special codes 9, 66, 77, 88, and 99 are set to NA.
ess1 <- ess1 %>%
  mutate(mnactic = case_when(
    mnactic %in% c(9, 66, 77, 88, 99) ~ NA_real_,
    TRUE ~ mnactic
  ))

# Create dummy variables based on 'mnactic' to indicate specific main activity statuses.
ess1$unemployed_dummy <- ifelse(ess1$mnactic %in% c(3, 4), 1, 0)    # Unemployed
ess1$employed_dummy <- ifelse(ess1$mnactic == 1, 1, 0)              # Employed
ess1$disabled_dummy <- ifelse(ess1$mnactic == 5, 1, 0)              # Permanently sick or disabled
ess1$ineducation_dummy <- ifelse(ess1$mnactic == 2, 1, 0)           # In education
ess1$retired_dummy <- ifelse(ess1$mnactic == 6, 1, 0)               # Retired
ess1$community_services_dummy <- ifelse(ess1$mnactic == 7, 1, 0)    # Community or military service
ess1$housework_dummy <- ifelse(ess1$mnactic == 8, 1, 0)             # Housework

# Filter the data to include only countries that were EU member states during the specified ESS rounds.
# Exclude data for countries in years before they joined the EU.

# Bulgaria - Joined in 2007, exclude data for years 2002, 2004, 2006.
ess1 <- ess1 %>%
  filter(!(cntry == "BG" & ess_year %in% c(2002, 2004, 
                                           2006)))

# Romania - Joined in 2007, exclude data for years 2002, 2004, 2006.
ess1 <- ess1 %>%
  filter(!(cntry == "RO" & ess_year %in% c(2002, 2004, 
                                           2006)))

# Croatia - Joined in 2013, exclude data for years before 2013. 
ess1 <- ess1 %>%
  filter(!(cntry == "HR" & ess_year %in% c(2002, 2004, 
                                           2006, 2008, 2010, 
                                           2012)))

# Cyprus, Czech Republic, Estonia, Hungary, Latvia, Lithuania, Poland, Slovakia, Slovenia - Joined in 2004.
# Exclude data for each of these countries from the 2002 ESS round.
ess1 <- ess1 %>%
  filter(!(cntry == "CY" & ess_year == 2002)) %>%
  filter(!(cntry == "CZ" & ess_year == 2002)) %>%
  filter(!(cntry == "EE" & ess_year == 2002)) %>%
  filter(!(cntry == "HU" & ess_year == 2002)) %>%
  filter(!(cntry == "LV" & ess_year == 2002)) %>%
  filter(!(cntry == "LT" & ess_year == 2002)) %>%
  filter(!(cntry == "PL" & ess_year == 2002)) %>%
  filter(!(cntry == "SK" & ess_year == 2002)) %>%
  filter(!(cntry == "SI" & ess_year == 2002))


# Now recode some other important variables to construct my index for welfare attitudes 

# Handle missing values for government responsibility variables:
# Recode 'gvcldcr' (government responsibility for child care) on a scale from 0–10
# replace missing codes (77, 88, 99) with NA.
ess1 <- ess1 %>%
  mutate(gvcldcr = case_when(
    gvcldcr %in% c(77, 88, 99) ~ NA_real_,
    TRUE ~ gvcldcr
  ))

# Similarly, for `gvslvol` (standard of living for the elderly) and
# `gvslvue` (standard of living for the unemployed), replace missing codes (77, 88, 99) with NA.
ess1 <- ess1 %>%
  mutate(gvslvol = case_when(
    gvslvol %in% c(77, 88, 99) ~ NA_real_,
    TRUE ~ gvslvol
  ))

ess1 <- ess1 %>%
  mutate(gvslvue = case_when(
    gvslvue %in% c(77, 88, 99) ~ NA_real_,
    TRUE ~ gvslvue
  ))

# sbeqsoc 
# Social benefits/services lead to a more equal society on a scale from 1 (agree strongly) to 5 (disagrees strongly) 
ess1 <- ess1 %>%
  mutate(sbeqsoc = case_when(
    sbeqsoc %in% c(7, 8, 9) ~ NA_real_,
    TRUE ~ sbeqsoc
  ))

ess1$sbeqsoc_rescaled <- 6 - ess1$sbeqsoc

# sbprvpv 
# Social benefits/services prevent widespread poverty on a scale from 1 (agree strongly) to 5 (disagrees strongly) 
ess1 <- ess1 %>%
  mutate(sbprvpv = case_when(
    sbprvpv %in% c(7, 8, 9) ~ NA_real_,
    TRUE ~ sbprvpv
  ))

ess1$sbprvpv_rescaled <- 6 - ess1$sbprvpv

# sbstrec 
# Social benefits/services place too great strain on economy on a scale from 1 (agree strongly) to 5 (disagrees strongly) 
ess1 <- ess1 %>%
  mutate(sbstrec = case_when(
    sbstrec %in% c(7, 8, 9) ~ NA_real_,
    TRUE ~ sbstrec
  ))

# imbgeco (immigration good or bad for country's economy) on a scale from 0 (bad) to 10 (good)
ess1 <- ess1 %>%
  mutate(imbgeco = case_when(
    imbgeco %in% c(77, 88, 99) ~ NA_real_,
    TRUE ~ imbgeco
    ))

ess1$imbgeco_rescaled <- 11 - ess1$imbgeco
attributes(ess1$imbgeco_rescaled) <- attributes(ess1$imbgeco)
ess1 <- subset(ess1, select = -imbgeco)

# Clean `imueclt` (immigration impact on cultural life: 0 = threat, 10 = enriched) 
# by setting missing values (77, 88, 99) to NA.
# Rescale `imueclt` so higher values indicate a greater threat (0 = enriched, 10 = threat) 
# for consistent interpretation.
ess1 <- ess1 %>%
  mutate(imueclt = case_when(
    imueclt %in% c(77, 88, 99) ~ NA_real_,
    TRUE ~ imueclt
))

ess1$imueclt_rescaled <- 11 - ess1$imueclt
attributes(ess1$imueclt_rescaled) <- attributes(ess1$imueclt)
ess1 <- subset(ess1, select = -imueclt)

# Clean `imsclbn` (timing for migrants’ rights to social benefits: 1 = immediately, 5 = never)
# recoding missing values (7, 8, 9) as NA.
ess1 <- ess1 %>%
  mutate(imsclbn = case_when(
    imsclbn %in% c(7, 8, 9) ~ NA_real_,
    TRUE ~ imsclbn
    ))

# Clean `imwbcnt` (immigration impact on country: 0 = worse, 10 = better)
# recoding missing values (77, 88, 99) as NA.
ess1 <- ess1 %>%
  mutate(imwbcnt = case_when(
    imwbcnt %in% c(77, 88, 99) ~ NA_real_,
    TRUE ~ imwbcnt
    ))

ess1$imwbcnt_rescaled <- 11 - ess1$imwbcnt
attributes(ess1$imwbcnt_rescaled) <- attributes(ess1$imwbcnt)
ess1 <- subset(ess1, select = -imwbcnt)

# smdfslv: 
# for fair society, differences in standard of living should be small 
# on a scale from 1 (agree strongly) to 5 (disagree strongly)
ess1 <- ess1 %>%
  mutate(smdfslv = case_when(
    smdfslv %in% c(7, 8, 9) ~ NA_real_,
    TRUE ~ smdfslv
))

ess1$smdfslv_rescaled <- 6 - ess1$smdfslv
attributes(ess1$smdfslv_rescaled) <- attributes(ess1$smdfslv)
ess1 <- subset(ess1, select = -smdfslv)

# Create a similar index to the one in Alesina et al. 2020. using the z-score formula 
# Calculate the z-score for `gvcldcr` (government responsibility for child care)
ess1$gvcldcr_z_score <- scale(ess1$gvcldcr)

# Calculate the z-score for `gvslvue` (government responsibility for supporting the unemployed)
ess1$gvslvue_z_score <- scale(ess1$gvslvue)

# Calculate the z-score for `gvslvol` (government responsibility for supporting the elderly)
ess1$gvslvol_z_score <- scale(ess1$gvslvol)

# gincdif_rescaled_z_score`: opinion on whether government intervention should reduce income differences 
#(rescaled to a positive scale) 
ess1$gincdif_rescaled_z_score <- scale(ess1$gincdif_rescaled)

# sbeqsoc_rescaled_z_score: opinion on whether social benefits/services create a more equal society 
# (rescaled to be positive)
ess1$sbeqsoc_rescaled_z_score <- scale(ess1$sbeqsoc_rescaled)

# sbprvpv_rescaled_z_score: opinion on whether social benefits/services prevent widespread poverty 
# (rescaled to be positive)
ess1$sbprvpv_rescaled_z_score <- scale(ess1$sbprvpv_rescaled)

4. National-Level Control Variables

4.1 Social Benefits (in % of GDP)

The dataset can be obtained from the following website: https://ec.europa.eu/eurostat/databrowser/view/spr_exp_sum__custom_12289262/default/table?lang=en

Data Cleaning:

  • Filtered national_social_protection for relevant data.

  • Renamed columns and adjusted country codes for Greece and the UK.

  • Removed unnecessary countries and regions.

  • Converted values to numeric.

Data Merging:

  • Merged ess1 with national_social_protection

Verification:

Confirmed no missing values in the merged dataset.

Cleanup:

  • Removed the national_social_protection dataset.
national_social_protection <- read_csv("Datasets/National_Social_Protection.csv",
                                                     col_types = cols(DATAFLOW = col_skip(), 
                                                                      `LAST UPDATE` = col_skip(), 
                                                                      freq = col_skip(), 
                                                                      OBS_FLAG = col_skip()))

national_social_protection <- national_social_protection %>%
  filter(unit == "PC_GDP")

national_social_protection <- national_social_protection %>%
  filter(spdeps == "TOTALNOREROUTE")

national_social_protection <- national_social_protection[,!(names(national_social_protection) %in% c("unit","spdeps"))]

# Rename country abbreviation 
national_social_protection <- national_social_protection %>% 
  mutate(geo = ifelse(geo == "EL", "GR", geo))

# Rename country abbreviation
national_social_protection <- national_social_protection %>% 
  mutate(geo = ifelse(geo == "UK", "GB", geo))

# Rename the columns 
national_social_protection <- national_social_protection %>% 
  rename(national_social_protection = OBS_VALUE, 
         ess_year = TIME_PERIOD, 
         cntry = geo)

# Filter
national_social_protection <- national_social_protection %>%
  filter(!(ess_year >= 1990 & ess_year <= 2000))

# List of countries to be removed
countries_to_remove <- c("AL", "TR", "MT", "MK", "EA18", "EA19", "EA20", "EEA", 
                         "EU27_2007", "EU27_2020", "EU28", "EA12")

# Remove the specified countries
national_social_protection <- national_social_protection %>%
  filter(!cntry %in% countries_to_remove)

# Make it numeric
national_social_protection$national_social_protection <- as.numeric(national_social_protection$national_social_protection)


# Merge both datasets together 
ess1 <- merge(ess1, national_social_protection, by = c("cntry", "ess_year"), all.x = TRUE)

# No missing values!
summary(ess1$national_social_protection)

# remove the dataset from the environment
remove(national_social_protection)

4.2 National Corruption Perception Index

The dataset can be obtained from the following website: https://www.transparency.org/en/cpi/2023

Read and Transform Data:

  • Loaded corruption_perception.xlsx.

  • Converted to long format with pivot_longer().

  • Dropped unnecessary columns and renamed columns for clarity.

Data Merging:

  • Merged the transformed corruption data (corruption_perception_long) with ess1.

Adjust Scores:

  • Adjusted the corruption perception scores so that higher values indicate more corruption.

Cleanup:

  • Removed intermediate data frames and verified no missing values in ess1.
# Read the data into the environment 
corruption_perception <- read_excel("Datasets/National_Corruption_Perception.xlsx")

# Make it from wide to long format 
corruption_perception_long <- tidyr::pivot_longer(corruption_perception,
                       cols = -c(country, cntry),
                       names_to = "year",
                       values_to = "score")

# remove the dataset from the environment 
remove(corruption_perception)

corruption_perception_long <- corruption_perception_long %>%
  dplyr::select(-country)

# Rename columns
corruption_perception_long <- corruption_perception_long %>%
  dplyr::rename(national_corruption_perception = score)

# Rename columns 
corruption_perception_long <- corruption_perception_long %>%
  dplyr::rename(ess_year = year)

# Merge it with the ESS dataset 
ess1 <- merge(ess1, corruption_perception_long, 
              by = c("cntry", "ess_year"), all.x = TRUE)

# remove the dataset from the environment
remove(corruption_perception_long) 

# Adjust is so that higher values indicate more corruption perception! 
ess1$national_corruption_perception <- 100-ess1$national_corruption_perception

# no missing values!

4.3 National Unemployment Rates (20-64 years)

The data can be obtained from the following site: https://ec.europa.eu/eurostat/databrowser/view/lfsq_urgan__custom_12075980/default/table?lang=en (of working age between 20 to 64)

Read and Transform Data:

  • Loaded national_unemployment.csv.

  • Adjusted year format and calculated average unemployment rates per year and country.

  • Renamed columns for clarity and updated country codes (e.g., UK to GB, EL to GR).

Data Merging:

  • Merged the processed unemployment data with ess1.

Cleanup:

  • Removed intermediate datasets and ensured no missing values in ess1.
# Read and process national unemployment data
national_unemployment <- read_csv("Datasets/National_Unemployment.csv")


# Trim time period to just year
national_unemployment$TIME_PERIOD <- substr(national_unemployment$TIME_PERIOD, 1, 
                                            nchar(national_unemployment$TIME_PERIOD) - 3)

# Calculate mean unemployment value per year and country
national_unemployment <- national_unemployment %>%
  group_by(TIME_PERIOD, geo) %>%
  summarize(mean_OBS_VALUE = mean(OBS_VALUE)) %>%
  rename(ess_year = TIME_PERIOD, cntry = geo, 
         national_unemployment_level = mean_OBS_VALUE)

# Correct country codes
national_unemployment <- national_unemployment %>%
  mutate(cntry = case_when(
    cntry == "UK" ~ "GB",
    cntry == "EL" ~ "GR",
    TRUE ~ cntry
  ))

# Merge with ess1
ess1 <- merge(ess1, national_unemployment, 
              by = c("cntry", "ess_year"), all.x = TRUE)

# No missing values

# Cleanup and remove the dataset from the environment
remove(national_unemployment)

4.4 Immigration Flows by Country of Previous Residence

The dataset can be otbained from the following website: https://ec.europa.eu/eurostat/databrowser/view/migr_imm5prv__custom_12112657/default/table?lang=en

Create Panel Data Frame:

  • Created panel_immigration with years (2002–2022) and countries.

Read and Transform Data:

  • Loaded immigration data from CSV files for different periods (2002-2023).

  • Standardised column names and country codes (e.g., UK to GB, EL to GR).

  • Merged data from all periods into a single dataset.

Adjust Specific Values:

  • Added missing values for Switzerland and Poland based on external sources.

  • Updated missing values using OECD migration database for various countries.

Merge with Panel Data:

  • Merged the cleaned immigration data with panel_immigration.

Cleanup:

  • Removed intermediate datasets.
# Create a panel data frame 
set.seed(123) 
ess_year <- 2002:2022
cntry <- c("AT", "BE", "BG", "CH", "CY", "CZ", "DE", "DK", "EE", "ES", 
            "FI", "FR", "GB", "GR", "HR", "HU", "IE", "IS", "IT", "LT", 
            "LU", "LV", "NL", "NO", "PL", "PT", "RO", "SE", "SI", "SK")

# format it as a data frame
panel_immigration <- data.frame(ess_year = rep(ess_year, length(cntry)), 
                                cntry = rep(cntry, each = length(ess_year)))

# Immigration data from 2020 to 2023 
immigration_flows_2020_2023 <- read_csv("Datasets/EU_Immigration_flows_2020_2023.csv")

immigration_flows_2020_2023 <- dplyr::select(immigration_flows_2020_2023,TIME_PERIOD, geo, 
                                             OBS_VALUE) 

immigration_flows_2020_2023 <- immigration_flows_2020_2023 %>%
  dplyr::rename(EU_immigration_flow = OBS_VALUE)


immigration_flows_2020_2023 <- immigration_flows_2020_2023 %>%
  dplyr::rename(cntry = geo)

immigration_flows_2020_2023 <- immigration_flows_2020_2023 %>%
  dplyr::rename(ess_year = TIME_PERIOD)

# Change UK to GB 
immigration_flows_2020_2023 <- immigration_flows_2020_2023 %>%
  mutate(cntry = ifelse(cntry == "UK", "GB", cntry))

# Change EL to GR 
immigration_flows_2020_2023 <- immigration_flows_2020_2023 %>%
  mutate(cntry = ifelse(cntry == "EL", "GR", cntry))

# Immigration data from 2013 to 2019 
immigration_flows_2013_2019 <- read_csv("Datasets/EU_Immigration_flows_2013_2019.csv")

immigration_flows_2013_2019 <- dplyr::select(immigration_flows_2013_2019,TIME_PERIOD, 
                                             geo, OBS_VALUE) 

immigration_flows_2013_2019 <- immigration_flows_2013_2019 %>%
  dplyr::rename(EU_immigration_flow = OBS_VALUE)


immigration_flows_2013_2019 <- immigration_flows_2013_2019 %>%
  dplyr::rename(cntry = geo)

immigration_flows_2013_2019 <- immigration_flows_2013_2019 %>%
  dplyr::rename(ess_year = TIME_PERIOD)

# Change UK to GB 
immigration_flows_2013_2019 <- immigration_flows_2013_2019 %>%
  mutate(cntry = ifelse(cntry == "UK", "GB", cntry))

# Change EL to GR 
immigration_flows_2013_2019 <- immigration_flows_2013_2019 %>%
  mutate(cntry = ifelse(cntry == "EL", "GR", cntry))

# Immigration data from 2007 to 2012 
immigration_flows_2007_2012 <- read_csv("Datasets/EU_Immigration_flows_2007_2012.csv")
immigration_flows_2007_2012 <- dplyr::select(immigration_flows_2007_2012,TIME_PERIOD, 
                                             geo, OBS_VALUE) 

immigration_flows_2007_2012 <- immigration_flows_2007_2012 %>%
  dplyr::rename(EU_immigration_flow = OBS_VALUE)


immigration_flows_2007_2012 <- immigration_flows_2007_2012 %>%
  dplyr::rename(cntry = geo)

immigration_flows_2007_2012 <- immigration_flows_2007_2012 %>%
  dplyr::rename(ess_year = TIME_PERIOD)

# Change UK to GB 
immigration_flows_2007_2012 <- immigration_flows_2007_2012 %>%
  mutate(cntry = ifelse(cntry == "UK", "GB", cntry))

# Change EL to GR 
immigration_flows_2007_2012 <- immigration_flows_2007_2012 %>%
  mutate(cntry = ifelse(cntry == "EL", "GR", cntry))

# Immigration data from 2004 to 2006 
immigration_flows_2004_2006 <- read_csv("Datasets/EU_Immigration_flows_2004_2006.csv")

immigration_flows_2004_2006 <- dplyr::select(immigration_flows_2004_2006,TIME_PERIOD, 
                                             geo, OBS_VALUE) 

immigration_flows_2004_2006 <- immigration_flows_2004_2006 %>%
  dplyr::rename(EU_immigration_flow = OBS_VALUE)


immigration_flows_2004_2006 <- immigration_flows_2004_2006 %>%
  dplyr::rename(cntry = geo)

immigration_flows_2004_2006 <- immigration_flows_2004_2006 %>%
  dplyr::rename(ess_year = TIME_PERIOD)

# Change UK to GB 
immigration_flows_2004_2006 <- immigration_flows_2004_2006 %>%
  mutate(cntry = ifelse(cntry == "UK", "GB", cntry))

# Change EL to GR 
immigration_flows_2004_2006 <- immigration_flows_2004_2006 %>%
  mutate(cntry = ifelse(cntry == "EL", "GR", cntry))

# Immigration data from 2002 to 2003 
immigration_flows_2002_2003 <- read_csv("Immigration_flows_2002_2003.csv")

immigration_flows_2002_2003 <- dplyr::select(immigration_flows_2002_2003,TIME_PERIOD, 
                                             geo, OBS_VALUE) 

immigration_flows_2002_2003 <- immigration_flows_2002_2003 %>%
  dplyr::rename(EU_immigration_flow = OBS_VALUE)


immigration_flows_2002_2003 <- immigration_flows_2002_2003 %>%
  dplyr::rename(cntry = geo)

immigration_flows_2002_2003 <- immigration_flows_2002_2003 %>%
  dplyr::rename(ess_year = TIME_PERIOD)

# Change UK to GB 
immigration_flows_2002_2003 <- immigration_flows_2002_2003 %>%
  mutate(cntry = ifelse(cntry == "UK", "GB", cntry))

# Change EL to GR 
immigration_flows_2002_2003 <- immigration_flows_2002_2003 %>%
  mutate(cntry = ifelse(cntry == "EL", "GR", cntry))

## Merge everything together 
immigration <-rbind(immigration_flows_2020_2023, 
                    immigration_flows_2013_2019, 
                    immigration_flows_2007_2012, 
                    immigration_flows_2004_2006, 
                    immigration_flows_2002_2003)

panel_immigration <- merge(panel_immigration, immigration, 
                           by = c("cntry", "ess_year"), all.x = TRUE)

# I do have some missing values that I need to adjust

# I will take the immigration by citizenship for CH: 
# https://ec.europa.eu/eurostat/databrowser/view/migr_imm1ctz__custom_12179741/default/table?lang=en
panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2004 & cntry == "CH", 58103, 
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2005 & cntry == "CH", 58954, 
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2006 & cntry == "CH", 66003, 
                                       EU_immigration_flow))


panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2008 & cntry == "CH", 113575, 
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2009 & cntry == "CH", 91138, 
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2010 & cntry == "CH", 91208, 
                                       EU_immigration_flow))

# Poland 2008: https://stat.gov.pl/en/topics/population/internationa-migration/main-directions-of-emigration-and-immigration-in-the-years-1966-2020-migration-for-permanent-residence,2,2.html
panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2008 & cntry == "PL", 10834, 
                                       EU_immigration_flow))


remove(immigration_flows_2002_2003, immigration_flows_2004_2006, 
       immigration_flows_2007_2012, immigration_flows_2013_2019, immigration_flows_2020_2023)

# I do have some missing values and for this reason I will utilise the OECD international migration database: https://data-explorer.oecd.org/vis?fs[0]=Topic%2C1%7CSociety%23SOC%23%7CMigration%23SOC_MIG%23&pg=0&fc=Topic&bp=true&snb=3&vw=tb&df[ds]=dsDisseminateFinalDMZ&df[id]=DSD_MIG%40DF_MIG&df[ag]=OECD.ELS.IMD&df[vs]=1.0&dq=.EU15.A.B11._T...&pd=2002%2C&to[TIME_PERIOD]=false

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2003 & cntry == "AT", 17188, 
                                       EU_immigration_flow))


panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2002 & cntry == "BE", 30225, 
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2002 & cntry == "CH", 49302, 
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2002 & cntry == "IE", 15500, 
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2002 & cntry == "LU", 8200, 
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2002 & cntry == "PT", 4301, 
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2003 & cntry == "BE", 30457, 
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2003 & cntry == "CH", 49751, 
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2003 & cntry == "DE", 98709, 
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2003 & cntry == "ES", 69924, 
                                       EU_immigration_flow))


panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2003 & cntry == "HU", 1527, 
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2003 & cntry == "IE", 17900, 
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2003 & cntry == "LU", 9182, 
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2003 & cntry == "PT", 3843, 
                                       EU_immigration_flow))

4.5 Emigration Flows by Country of Next Residence

The dataset can be obtained from the following website: https://ec.europa.eu/eurostat/databrowser/view/migr_emi3nxt__custom_12116655/default/table?lang=en

Create Panel Data Frame:

  • Created panel_emigration with years (2002–2022) and countries.

Read and Transform Data:

  • Loaded emigration data from CSV files for different periods (2002-2023).

  • Standardised column names and country codes (e.g., UK to GB, EL to GR).

  • Merged data from all periods into a single dataset.

Adjust Specific Values:

  • Added missing values for Poland and Switzerland based on external sources.

Merge with Panel Data:

  • Merged the cleaned emigration data with panel_emigration.

Cleanup:

  • Removed intermediate datasets.
# Create a panel data frame 
set.seed(123) 
ess_year <- 2002:2022
cntry <- c("AT", "BE", "BG", "CH", "CY", "CZ", "DE", "DK", "EE", "ES", 
            "FI", "FR", "GB", "GR", "HR", "HU", "IE", "IS", "IT", "LT", 
            "LU", "LV", "NL", "NO", "PL", "PT", "RO", "SE", "SI", "SK")

# Format it as a data frame
panel_emigration <- data.frame(ess_year = rep(ess_year, length(cntry)), 
                               cntry = rep(cntry, each = length(ess_year)))

# Emigration data 2020 to 2023 
emigration_flows_2020_2023 <- read_csv("Datasets/EU_emigration_flows_2020_2023.csv")

emigration_flows_2020_2023 <- dplyr::select(emigration_flows_2020_2023,TIME_PERIOD, 
                                            geo, OBS_VALUE) 

emigration_flows_2020_2023 <- emigration_flows_2020_2023 %>%
  dplyr::rename(EU_emigration_flow = OBS_VALUE)


emigration_flows_2020_2023 <- emigration_flows_2020_2023 %>%
  dplyr::rename(cntry = geo)

emigration_flows_2020_2023 <- emigration_flows_2020_2023 %>%
  dplyr::rename(ess_year = TIME_PERIOD)

# Change UK to GB 
emigration_flows_2020_2023 <- emigration_flows_2020_2023 %>%
  mutate(cntry = ifelse(cntry == "UK", "GB", cntry))

# Change EL to GR 
emigration_flows_2020_2023 <- emigration_flows_2020_2023 %>%
  mutate(cntry = ifelse(cntry == "EL", "GR", cntry))

# Emigration data from 2013 to 2019 
emigration_flows_2013_2019 <- read_csv("Datasets/EU_emigration_flows_2013_2019.csv")

emigration_flows_2013_2019 <- dplyr::select(emigration_flows_2013_2019,TIME_PERIOD, 
                                            geo, OBS_VALUE) 

emigration_flows_2013_2019 <- emigration_flows_2013_2019 %>%
  dplyr::rename(EU_emigration_flow = OBS_VALUE)


emigration_flows_2013_2019 <- emigration_flows_2013_2019 %>%
  dplyr::rename(cntry = geo)

emigration_flows_2013_2019 <- emigration_flows_2013_2019 %>%
  dplyr::rename(ess_year = TIME_PERIOD)

# Change UK to GB 
emigration_flows_2013_2019 <- emigration_flows_2013_2019 %>%
  mutate(cntry = ifelse(cntry == "UK", "GB", cntry))

# Change EL to GR 
emigration_flows_2013_2019 <- emigration_flows_2013_2019 %>%
  mutate(cntry = ifelse(cntry == "EL", "GR", cntry))

# Emigration data from 2007 to 2012 
emigration_flows_2007_2012 <- read_csv("Datasets/EU_emigration_flows_2007_2012.csv")

emigration_flows_2007_2012 <- dplyr::select(emigration_flows_2007_2012,TIME_PERIOD, 
                                            geo, OBS_VALUE) 

emigration_flows_2007_2012 <- emigration_flows_2007_2012 %>%
  dplyr::rename(EU_emigration_flow = OBS_VALUE)


emigration_flows_2007_2012 <- emigration_flows_2007_2012 %>%
  dplyr::rename(cntry = geo)

emigration_flows_2007_2012 <- emigration_flows_2007_2012 %>%
  dplyr::rename(ess_year = TIME_PERIOD)

# Change UK to GB 
emigration_flows_2007_2012 <- emigration_flows_2007_2012 %>%
  mutate(cntry = ifelse(cntry == "UK", "GB", cntry))

# Change EL to GR 
emigration_flows_2007_2012 <- emigration_flows_2007_2012 %>%
  mutate(cntry = ifelse(cntry == "EL", "GR", cntry))

# Emigration data from 2004 to 2006 
emigration_flows_2004_2006 <- read_csv("Datasets/EU_emigration_flows_2004_2006.csv")

emigration_flows_2004_2006 <- dplyr::select(emigration_flows_2004_2006,TIME_PERIOD, geo, 
                                            OBS_VALUE) 

emigration_flows_2004_2006 <- emigration_flows_2004_2006 %>%
  dplyr::rename(EU_emigration_flow = OBS_VALUE)


emigration_flows_2004_2006 <- emigration_flows_2004_2006 %>%
  dplyr::rename(cntry = geo)

emigration_flows_2004_2006 <- emigration_flows_2004_2006 %>%
  dplyr::rename(ess_year = TIME_PERIOD)

# Change UK to GB 
emigration_flows_2004_2006 <- emigration_flows_2004_2006 %>%
  mutate(cntry = ifelse(cntry == "UK", "GB", cntry))

# Change EL to GR 
emigration_flows_2004_2006 <- emigration_flows_2004_2006 %>%
  mutate(cntry = ifelse(cntry == "EL", "GR", cntry))

# Emigration data from 2002 to 2003 
emigration_flows_2002_2003 <- read_csv("Datasets/EU_emigration_flows_2002_2003.csv")

emigration_flows_2002_2003 <- dplyr::select(emigration_flows_2002_2003,TIME_PERIOD, 
                                            geo, OBS_VALUE) 

emigration_flows_2002_2003 <- emigration_flows_2002_2003 %>%
  dplyr::rename(EU_emigration_flow = OBS_VALUE)


emigration_flows_2002_2003 <- emigration_flows_2002_2003 %>%
  dplyr::rename(cntry = geo)

emigration_flows_2002_2003 <- emigration_flows_2002_2003 %>%
  dplyr::rename(ess_year = TIME_PERIOD)

# Change UK to GB 
emigration_flows_2002_2003 <- emigration_flows_2002_2003 %>%
  mutate(cntry = ifelse(cntry == "UK", "GB", cntry))

# Change EL to GR 
emigration_flows_2002_2003 <- emigration_flows_2002_2003 %>%
  mutate(cntry = ifelse(cntry == "EL", "GR", cntry))


## Merge everything together 
emigration <-rbind(emigration_flows_2020_2023, emigration_flows_2013_2019, 
                   emigration_flows_2007_2012,
                   emigration_flows_2004_2006, emigration_flows_2002_2003)

panel_emigration <- merge(panel_emigration, emigration, 
                          by = c("cntry", "ess_year"), all.x = TRUE)

# I do have some missing values that I will adjust 

# Poland 2008: 
# https://stat.gov.pl/en/topics/population/internationa-migration/main-directions-of-emigration-and-immigration-in-the-years-1966-2020-migration-for-permanent-residence,2,2.html
panel_emigration <- panel_emigration %>%
  mutate(EU_emigration_flow = if_else(ess_year == 2008 & cntry == "PL", 24946, 
                                      EU_emigration_flow))

# Poland 2006: 
# https://stat.gov.pl/en/topics/population/internationa-migration/main-directions-of-emigration-and-immigration-in-the-years-1966-2020-migration-for-permanent-residence,2,2.html
panel_emigration <- panel_emigration %>%
  mutate(EU_emigration_flow = if_else(ess_year == 2006 & cntry == "PL", 40618, 
                                      EU_emigration_flow))

# Switzerland 2008: 
panel_emigration <- panel_emigration %>%
  mutate(EU_emigration_flow = if_else(ess_year == 2008 & cntry == "CH", 24552, 
                                      EU_emigration_flow))

4.6 Create an Intra-EU Migration Measure

Merge Data:

  • Combined panel_immigration and panel_emigration into eu_migration.

Calculate Net Migration:

  • Computed EU_net_migration as the difference between immigration and emigration flows.

Calculate 4-Year Cumulative Net Migration:

  • Used rollapply to calculate 4-year cumulative immigration and emigration flows.

  • Computed national_net_migration_4yr as the difference between cumulative immigration and emigration flows.

Merge with ESS Data:

  • Merged eu_migration with ess1.

Create Dummy Variables:

  • Created immigration_affected for positive net migration.

  • Created emigration_affected for negative net migration.

Subset Data:

  • Filtered ess1 into ess1_emigration (emigration affected) and ess1_immigration (immigration affected).

Cleanup:

  • Removed intermediate datasets and variables.
# Merge the panel_immigration and panel_emigration together 
eu_migration <- merge(panel_immigration, 
                      panel_emigration, by = c("ess_year", "cntry"))

# Now calculate net EU migration based on flows 
eu_migration$EU_net_migration <- (eu_migration$EU_immigration_flow) - (eu_migration$EU_emigration_flow)


# Now calculate net migration for the previous four years by taking the sum (cumulative values) 
eu_migration <- eu_migration %>%
  group_by(cntry) %>%
  arrange(ess_year) %>%
  mutate(
    EU_immigration_cumulative_4yr = as.numeric(rollapply(EU_immigration_flow, 
                                                         width = 4, FUN = sum, 
                                                         na.rm = TRUE, fill = NA, 
                                                         align = "right")),
    EU_emigration_cumulative_4yr = as.numeric(rollapply(EU_emigration_flow, 
                                                        width = 4, FUN = sum, 
                                                        na.rm = TRUE, 
                                                        fill = NA, 
                                                        align = "right")
  )) %>%
  ungroup()

# Now calculate net EU migration based on cumulative values for the previous 4 years
eu_migration$national_net_migration_4yr <- (eu_migration$EU_immigration_cumulative_4yr) - (eu_migration$EU_emigration_cumulative_4yr)

ess1 <- merge(ess1, eu_migration, by =c("cntry", "ess_year"), all.x = TRUE)

# Create a dummy variables to indocate whether a country was impacted by a positive net EU migration
ess1$immigration_affected <- ifelse(ess1$EU_net_migration > 0, 1, 0)

# Create a dummy variables to indocate whether a country was impacted by a negative net EU migration
ess1$emigration_affected <- ifelse(ess1$EU_net_migration < 0, 1, 0)

ess1_emigration <- subset(ess1, emigration_affected == 1)

ess1_immigration <- subset(ess1, immigration_affected == 1)

# Remove the datasets from the environment since they will not be used 
remove(panel_emigration, panel_immigration, panel_immigration_c, 
       immigration, emigration_flows_2002_2003, emigration_flows_2004_2006, 
       emigration_flows_2007_2012, emigration_flows_2013_2019, 
       emigration_flows_2020_2023, 
       emigration,
       immigration.citizenship)

remove(cntry)
remove(ess_year)
remove(countries_to_remove)

5. Cleaning the NUTS Codes

Unfortunately, there is missing data for the variable ‘regunit’ because it was measured differently in some rounds. Now, I have to address this issue manually for each country that is part of the European Social Survey (ESS). I consider the 2016 NUTS codes

Which countries are not ok in terms of recoding? Finland (2002, 2004, 2006 and 2008), Greece (2004), Ireland (2006) and Ireland (2008), Norway (2020 and 2023)

Changes in the NUTS regions over time? France, Greece, Hungary (for Pest and Budapest)!

Count Missing Values:

  • Calculates the number of missing values in the regunit column of ess1 and prints the result.

NUTS Code Overview:

  • Retrieves an overview of NUTS codes using regions::nuts_changes.

Data Imputation by Country:

  • Austria: Imputes missing regunit values with NUTS level 2 and assigns appropriate region codes based on regionat and essround.
  • Belgium: Imputes missing regunit values with NUTS level 1 and assigns region codes based on regionbe and essround.

  • Bulgaria: Imputes missing regunit values with NUTS level 3 and assigns region codes based on regionbg and essround.

  • Switzerland: Imputes missing regunit values with NUTS level 2 and assigns region codes based on regioach and essround.

  • Cyprus: Imputes missing regunit values with NUTS level 1 for rounds 3 and 4.

  • Czech Republic: Imputes missing regunit values with NUTS level 2 or 3, and assigns region codes based on regioacz or regioncz depending on essround.

  • Germany: Imputes missing regunit values with NUTS level 1 and assigns region codes based on regionde and essround.

  • Denmark: Imputes missing regunit values with NUTS level 2 or 3, and assigns region codes based on regioadk or regiondk depending on essround.

  • Estonia: Imputes missing regunit values with NUTS level 3 and assigns region codes based on regionee and essround.

  • Spain: Imputes missing regunit values with NUTS level 2 and assigns region codes based on regioaes for rounds 1 and 2, and regiones for rounds 3 and 4.

  • United Kingdom: Imputes missing regunit values with NUTS level 1 and assigns region codes based on regiongb across all rounds.

  • Slovakia: Imputes missing regunit values with NUTS level 3 and assigns region codes based on regionsk and essround.

  • Sweden: Imputes missing regunit values with NUTS level 2 and assigns region codes based on regioaes for rounds 1 to 4.

  • Romania: Imputes missing regunit values with NUTS level 2 and assigns region codes based on regioaes for rounds 3 and 4.

  • Poland: Imputes missing regunit values with NUTS level 2 and assigns region codes based on regioaes for rounds 1 to 4.

  • Netherlands: Imputes missing regunit values with NUTS level 3 and assigns region codes based on regioaes for rounds 1 to 4.

  • Luxembourg: Imputes missing regunit values with NUTS level 1 and assigns region codes based on regioaes for rounds 1 and 2.

  • Latvia: Imputes missing regunit values with NUTS level 3 and assigns region codes based on regioaes for rounds 3 and 4.

  • Lithuania: Imputes missing regunit values with NUTS level 3 and assigns region codes based on regioaes for round 4.

  • Italy: Imputes missing regunit values with NUTS level 2 and assigns region codes based on regioaes for rounds 1 and 2.

  • Iceland: Imputes missing regunit values with NUTS level 1 and assigns region codes based on regioaes for round 2.

  • Portugal: Imputes missing regunit values with NUTS level 2 and assigns region codes based on regioaes for rounds 1 to 4.

  • Finland: Imputes missing regunit values with NUTS level 2 and assigns region codes based on regioaes for rounds 1 to 4.

  • Greece: Imputes missing regunit values with NUTS level 2 and assigns region codes based on regioaes for rounds 1, 2, and 4.

Update Missing Values for Round 10: Assigns specific NUTS region codes to missing “regunit” values for countries in round 10 based on predefined codes.

Adjust Region Codes: Modifies region codes to align with the official NUTS classification for various countries across different survey rounds, including Estonia, Ireland, Finland, Greece, France, Lithuania, Poland, Slovenia, Croatia, and Hungary.

Remove Unnecessary Columns: Removes outdated regional columns from the dataset that are no longer relevant.

Exclude Norway Data: Filters out data for Norway in round 10 due to significant changes in regional classifications.

Filter Hungary Data: Excludes specific regional data for Hungary in earlier rounds to ensure consistency in regional coding.

Create NUTS Columns

  • NUTS 1: Assigns the region code to the nuts1 column based on the regunit value and applies specific adjustments for various countries (e.g., AT, BE, BG, CY, CZ, CH, DK, EE, FI, FR, GR, HR, HU, IE, IS, IT, LT, LU, LV, NL, NO, PL, PT, RO, SE, SI, SK).

  • NUTS 2: Assigns the region code to the nuts2 column for regunit value 2 and makes specific adjustments for certain countries (e.g., BG, CZ, DK, EE, FI, HU, HR, IE, LT, LV, NL, SI, SK, LU, IS, CY, DE, GB, BE, IT).

  • Data Cleaning and Adjustments:

    • Replaces 99999 with NA in the region column.

    • Removes rows with missing nuts2 values.

    • Renames region to nuts for consistency.

    • Merges ess1 with Eurostat geospatial data for geographic information.

    • Updates NAME_LATN to proper case formatting.

    • Adds year_census based on the ess_year value.

    • Standardises certain NUTS 2 codes for Ireland.

# How many missing values do I have? 
missing_values <- sum(is.na(ess1$regunit))
cat("Number of missing values for the variable 'regunit':", missing_values)

# I need an overview with all the different nuts codes (reference is 
nuts_overview <- regions::nuts_changes

# Austria # -> round 1, 2, 3 and 4 and NUTS level 2 
missing_regunit_at <- is.na(ess1$regunit) & (ess1$essround %in% c(1, 2, 3, 4)) & (ess1$cntry == "AT")
ess1$region[missing_regunit_at & ess1$regionat == 1] <- "AT11" # Burgenland
ess1$region[missing_regunit_at & ess1$regionat == 2] <- "AT21" # Kärnten
ess1$region[missing_regunit_at & ess1$regionat == 3] <- "AT12" # Niederösterreich 
ess1$region[missing_regunit_at & ess1$regionat == 4] <- "AT31" # Oberösterreich
ess1$region[missing_regunit_at & ess1$regionat == 5] <- "AT32" # Salzburg
ess1$region[missing_regunit_at & ess1$regionat == 6] <- "AT22" # Steiermark
ess1$region[missing_regunit_at & ess1$regionat == 7] <- "AT33" # Tirol
ess1$region[missing_regunit_at & ess1$regionat == 8] <- "AT33" # Voralberg
ess1$region[missing_regunit_at & ess1$regionat == 9] <- "AT13" # Wien
ess1$region[missing_regunit_at & ess1$regionat == 999] <- NA # not available
ess1$regunit[missing_regunit_at] <- 2 # Assign NUTS level 2

# Belgium # -> round 1, 2, 3, and 4 and NUTS level 1 
missing_regunit_be <- is.na(ess1$regunit) & (ess1$essround %in% c(1, 2, 3, 4)) & (ess1$cntry == "BE")
ess1$region[missing_regunit_be & ess1$regionbe == 1] <- "BE2" # Flemish Region
ess1$region[missing_regunit_be & ess1$regionbe == 2] <- "BE1" # Brussels Region
ess1$region[missing_regunit_be & ess1$regionbe == 3] <- "BE3" # Walloon Region
ess1$region[missing_regunit_be & ess1$regionbe == 999] <- NA # not available
ess1$regunit[missing_regunit_be] <- 1 #Assign NUTS level 1

# Bulgaria # -> round 1, 2, 3, and 4 and NUTS level 3
missing_regunit_bg <- is.na(ess1$regunit) & (ess1$essround %in% c(3, 4)) & (ess1$cntry == "BG")
ess1$regunit[missing_regunit_bg] <- 3 #Assign NUTS level 3
ess1$region[missing_regunit_bg & ess1$regionbg == 1] <- "BG413" # Blagoevgrad
ess1$region[missing_regunit_bg & ess1$regionbg == 2] <- "BG341" # Bourgas
ess1$region[missing_regunit_bg & ess1$regionbg == 3] <- "BG331" # Varna
ess1$region[missing_regunit_bg & ess1$regionbg == 4] <- "BG321" # Veliko Tarnovo
ess1$region[missing_regunit_bg & ess1$regionbg == 5] <- "BG311" # Vidin
ess1$region[missing_regunit_bg & ess1$regionbg == 6] <- "BG313" # Vratca
ess1$region[missing_regunit_bg & ess1$regionbg == 7] <- "BG322" # Gabrovo
ess1$region[missing_regunit_bg & ess1$regionbg == 8] <- "BG332" # Dobrich
ess1$region[missing_regunit_bg & ess1$regionbg == 9] <- "BG425" # Kurdjali
ess1$region[missing_regunit_bg & ess1$regionbg == 10] <- "BG415" # Kustendil
ess1$region[missing_regunit_bg & ess1$regionbg == 11] <- "BG315" # Lovetch
ess1$region[missing_regunit_bg & ess1$regionbg == 12] <- "BG312" # Montana
ess1$region[missing_regunit_bg & ess1$regionbg == 13] <- "BG423" # Pazardjik
ess1$region[missing_regunit_bg & ess1$regionbg == 14] <- "BG414" # Pernik
ess1$region[missing_regunit_bg & ess1$regionbg == 15] <- "BG314" # Pleven
ess1$region[missing_regunit_bg & ess1$regionbg == 16] <- "BG421" # Plovdiv
ess1$region[missing_regunit_bg & ess1$regionbg == 17] <- "BG324" # Razgrad
ess1$region[missing_regunit_bg & ess1$regionbg == 18] <- "BG323" # Rouse
ess1$region[missing_regunit_bg & ess1$regionbg == 19] <- "BG325" # Silistra
ess1$region[missing_regunit_bg & ess1$regionbg == 20] <- "BG342" # Sliven
ess1$region[missing_regunit_bg & ess1$regionbg == 21] <- "BG424" # Smolian
ess1$region[missing_regunit_bg & ess1$regionbg == 22] <- "BG412" # Sofia
ess1$region[missing_regunit_bg & ess1$regionbg == 23] <- "BG411" # Sofia-region
ess1$region[missing_regunit_bg & ess1$regionbg == 24] <- "BG344" # Stara Zagora
ess1$region[missing_regunit_bg & ess1$regionbg == 25] <- "BG334" # Targovishte
ess1$region[missing_regunit_bg & ess1$regionbg == 26] <- "BG422" # Haskovo
ess1$region[missing_regunit_bg & ess1$regionbg == 27] <- "BG333" # Shoumen
ess1$region[missing_regunit_bg & ess1$regionbg == 28] <- "BG343" # Iambol
ess1$region[missing_regunit_bg & ess1$regionbg == 999] <- NA # Not available 

# Switzerland # -> round 1, 2, 3, and 4 and NUTS level 2
missing_regunit_ch <- ess1$essround %in% c(1, 2, 3, 4) & ess1$cntry == "CH"
ess1$regunit[missing_regunit_ch] <- 2 #Assign NUTS level 2
ess1$region[missing_regunit_ch & ess1$regioach == 1] <- "CH01" # Région lémanique
ess1$region[missing_regunit_ch & ess1$regioach == 2] <- "CH02" # Espace Mittelland
ess1$region[missing_regunit_ch & ess1$regioach == 3] <- "CH03" # Nordwestschweiz
ess1$region[missing_regunit_ch & ess1$regioach == 4] <- "CH04" # Zürich
ess1$region[missing_regunit_ch & ess1$regioach == 5] <- "CH05" # Ostschweiz
ess1$region[missing_regunit_ch & ess1$regioach == 6] <- "CH06" # Zentralschweiz
ess1$region[missing_regunit_ch & ess1$regioach == 7] <- "CH07" # Ticino
ess1$region[missing_regunit_ch & ess1$regioach == 999] <- NA # Not available 

# Cyprus # -> round 3 and 4 and NUTS level 1
missing_regunit_cy <- is.na(ess1$regunit) & (ess1$essround %in% c(3, 4)) & (ess1$cntry == "CY")
ess1$region[missing_regunit_cy] <- "CY0"
ess1$regunit[missing_regunit_cy] <- 1 #Assign NUTS level 1

# Czech Republic # -> round 4 and NUTS level 2 (regioacz)
missing_regunit_cz <- is.na(ess1$regunit) & (ess1$essround %in% c(4)) & (ess1$cntry == "CZ")
ess1$regunit[missing_regunit_cz] <- 2 #Assign NUTS level 2
ess1$region[missing_regunit_cz & ess1$regioacz == 1] <- "CZ01" # Praha
ess1$region[missing_regunit_cz & ess1$regioacz == 2] <- "CZ02" # Stredni Cechy
ess1$region[missing_regunit_cz & ess1$regioacz == 3] <- "CZ03" # Jihozapad
ess1$region[missing_regunit_cz & ess1$regioacz == 4] <- "CZ04" # Severozapad
ess1$region[missing_regunit_cz & ess1$regioacz == 5] <- "CZ05" # Severovychod
ess1$region[missing_regunit_cz & ess1$regioacz == 6] <- "CZ06" # Jihovychod
ess1$region[missing_regunit_cz & ess1$regioacz == 7] <- "CZ07" # Stredni Morava
ess1$region[missing_regunit_cz & ess1$regioacz == 8] <- "CZ08" # Moravskoslezsko
ess1$region[missing_regunit_cz & ess1$regioacz == 999] <- NA # not available

# Czechia # -> round 1 and 2 and NUTS level 3 (regioncz)
missing_regunit_cz <- is.na(ess1$regunit) & (ess1$essround %in% c(1, 2)) & (ess1$cntry == "CZ")
ess1$regunit[missing_regunit_cz] <- 3 #Assign NUTS level 3
ess1$region[missing_regunit_cz & ess1$regioncz == 1] <- "CZ010" #Prague
ess1$region[missing_regunit_cz & ess1$regioncz == 2] <- "CZ020" #Central Bohemia
ess1$region[missing_regunit_cz & ess1$regioncz == 3] <- "CZ031" #South Bohemia
ess1$region[missing_regunit_cz & ess1$regioncz == 4] <- "CZ032" #Plzen Reg. Bohemia
ess1$region[missing_regunit_cz & ess1$regioncz == 5] <- "CZ041" #Karlovy Vary Reg. 
ess1$region[missing_regunit_cz & ess1$regioncz == 6] <- "CZ042" #Usti Reg.
ess1$region[missing_regunit_cz & ess1$regioncz == 7] <- "CZ051" #Liberec Reg.
ess1$region[missing_regunit_cz & ess1$regioncz == 8] <- "CZ052" #Hradec Kralove Reg.
ess1$region[missing_regunit_cz & ess1$regioncz == 9] <- "CZ053" #Pardubice Reg.
ess1$region[missing_regunit_cz & ess1$regioncz == 10] <- "CZ063" #Vysocina
ess1$region[missing_regunit_cz & ess1$regioncz == 11] <- "CZ064" #South Moravia
ess1$region[missing_regunit_cz & ess1$regioncz == 12] <- "CZ071" #Olomouc Reg.
ess1$region[missing_regunit_cz & ess1$regioncz == 13] <- "CZ072" #Zlin Reg.
ess1$region[missing_regunit_cz & ess1$regioncz == 14] <- "CZ072" #Moravian Silesia Reg.
ess1$region[missing_regunit_cz & ess1$regioncz == 999] <- NA # not available

missing_regunit_de <- is.na(ess1$regunit) & (ess1$essround %in% c(1, 2, 3, 4)) & (ess1$cntry == "DE")
ess1$regunit[missing_regunit_de] <- 1 #Assign NUTS level 1
ess1$region[missing_regunit_de & ess1$regionde == 1] <- "DEF" # Schleswig-Holstein
ess1$region[missing_regunit_de & ess1$regionde == 2] <- "DE6" # Hamburg
ess1$region[missing_regunit_de & ess1$regionde == 3] <- "DE9" # Niedersachsen
ess1$region[missing_regunit_de & ess1$regionde == 4] <- "DE5" # Bremen
ess1$region[missing_regunit_de & ess1$regionde == 5] <- "DEA" # Nordrhein-Westfalen
ess1$region[missing_regunit_de & ess1$regionde == 6] <- "DE7" # Hessen
ess1$region[missing_regunit_de & ess1$regionde == 7] <- "DEB" # Rheinland-Pfalz
ess1$region[missing_regunit_de & ess1$regionde == 8] <- "DE1" # Baden-Württemberg
ess1$region[missing_regunit_de & ess1$regionde == 9] <- "DE2" # Bayern
ess1$region[missing_regunit_de & ess1$regionde == 10] <- "DEC" # Saarland
ess1$region[missing_regunit_de & ess1$regionde == 11] <- "DE3" # Berlin
ess1$region[missing_regunit_de & ess1$regionde == 12] <- "DE4" # Brandenburg
ess1$region[missing_regunit_de & ess1$regionde == 13] <- "DE8" # Mecklenburg-Vorpommern
ess1$region[missing_regunit_de & ess1$regionde == 14] <- "DED" # Sachsen
ess1$region[missing_regunit_de & ess1$regionde == 15] <- "DEE" # Sachsen-Anhalt
ess1$region[missing_regunit_de & ess1$regionde == 16] <- "DEG" # Thüringen
ess1$region[missing_regunit_de & ess1$regionde == 999] <- NA # not available

# Denmark # -> round 4 and NUTS level 2 (regioadk)
missing_regunit_dk <- is.na(ess1$regunit) & (ess1$essround %in% c(4)) & (ess1$cntry == "DK")
ess1$regunit[missing_regunit_dk] <- 2 #Assign NUTS level 2
ess1$region[missing_regunit_dk & ess1$regioadk == 1] <- "DK01" # Hovedstaden
ess1$region[missing_regunit_dk & ess1$regioadk == 2] <- "DK02" # Sjælland
ess1$region[missing_regunit_dk & ess1$regioadk == 3] <- "DK03" # Syddanmark
ess1$region[missing_regunit_dk & ess1$regioadk == 4] <- "DK04" # Midjylland
ess1$region[missing_regunit_dk & ess1$regioadk == 5] <- "DK05" # Nordjylland
ess1$region[missing_regunit_dk & ess1$regioadk == 999] <- NA

# Denmark # -> round 1, 2 and 3 and NUTS level 2 (regiondk)
missing_regunit_dk <- is.na(ess1$regunit) & (ess1$essround %in% c(1, 2, 3)) & (ess1$cntry == "DK")
ess1$regunit[missing_regunit_dk] <- 3 #Assign NUTS level 3
ess1$region[missing_regunit_dk & ess1$regiondk == 1] <- "DK01" # Københavns og Frederiksberg Kommune (Hovedstaden)
ess1$region[missing_regunit_dk & ess1$regiondk == 2] <- "DK01" # Københavns Amt (Hovedstaden)
ess1$region[missing_regunit_dk & ess1$regiondk == 3] <- "DK01" # Frederiksborg Amt (Hovedstaden)
ess1$region[missing_regunit_dk & ess1$regiondk == 4] <- "DK02" # Roskilde Amt (Sjælland)
ess1$region[missing_regunit_dk & ess1$regiondk == 5] <- "DK02" # Vestsjællands Amt (Sjælland)
ess1$region[missing_regunit_dk & ess1$regiondk == 6] <- "DK02" # Storstrøms Amt (Sydsjælland)
ess1$region[missing_regunit_dk & ess1$regiondk == 7] <- "DK01" # Bornholms Amt (Hovedstaden)
ess1$region[missing_regunit_dk & ess1$regiondk == 8] <- "DK03" # Fyns Amt (Syddanmark)
ess1$region[missing_regunit_dk & ess1$regiondk == 9] <- "DK03" # Sønderjyllands Amt (Syddanmark)
ess1$region[missing_regunit_dk & ess1$regiondk == 10] <- "DK03" # Ribe Amt (Syddanmark)
ess1$region[missing_regunit_dk & ess1$regiondk == 11] <- "DK03" # Vejle Amt (Syddanmark)
ess1$region[missing_regunit_dk & ess1$regiondk == 12] <- "DK04" # Ringkøbing Amt ((Midtjylland)
ess1$region[missing_regunit_dk & ess1$regiondk == 13] <- "DK04" # Århus Amt (Midtjylland)
ess1$region[missing_regunit_dk & ess1$regiondk == 14] <- "DK04" # Viborg Amt (Midtjylland)
ess1$region[missing_regunit_dk & ess1$regiondk == 15] <- "DK05" # Nordjyllands Amt (Nordjylland)
ess1$region[missing_regunit_dk & ess1$regiondk == 999] <- NA


# Estonia # -> round 2, 3 and 4 and NUTS level 3
missing_regunit_ee <- is.na(ess1$regunit) & (ess1$essround %in% c(2, 3, 4)) & (ess1$cntry == "EE")
ess1$region[missing_regunit_ee & ess1$regionee == 1] <- "EE001" # Põhja-Eesti
ess1$region[missing_regunit_ee & ess1$regionee == 4] <- "EE004" # Lääne-Eesti
ess1$region[missing_regunit_ee & ess1$regionee == 6] <- "EE006" # Kesk-Eesti
ess1$region[missing_regunit_ee & ess1$regionee == 7] <- "EE007" # Kirde-Eesti
ess1$region[missing_regunit_ee & ess1$regionee == 8] <- "EE008" # Lõuna-Eesti
ess1$region[missing_regunit_ee & ess1$regionee == 999] <- NA 
ess1$regunit[missing_regunit_ee] <- 3 #Assign NUTS level 3

# Spain # -> round 1 and 2 and NUTS level 2 (regioaes) 
missing_regunit_es1 <- ess1$essround %in% c(3, 4) & ess1$cntry == "ES"
ess1$regunit[missing_regunit_es1] <- 2 #Assign NUTS level 2
ess1$region[missing_regunit_es1 & ess1$regioaes == 11] <- "ES11" # Galicia
ess1$region[missing_regunit_es1 & ess1$regioaes == 12] <- "ES12" # Principado de Asturias
ess1$region[missing_regunit_es1 & ess1$regioaes == 13] <- "ES13" # Cantabria
ess1$region[missing_regunit_es1 & ess1$regioaes == 21] <- "ES21" # País Vasco
ess1$region[missing_regunit_es1 & ess1$regioaes == 22] <- "ES22" # Comunidad Foral de Navarra
ess1$region[missing_regunit_es1 & ess1$regioaes == 23] <- "ES23" # La Rioja
ess1$region[missing_regunit_es1 & ess1$regioaes == 24] <- "ES24" # Aragón
ess1$region[missing_regunit_es1 & ess1$regioaes == 30] <- "ES30" # Comunidad de Madrid
ess1$region[missing_regunit_es1 & ess1$regioaes == 41] <- "ES41" # Castilla y León
ess1$region[missing_regunit_es1 & ess1$regioaes == 42] <- "ES42" # Castilla-La Mancha
ess1$region[missing_regunit_es1 & ess1$regioaes == 43] <- "ES43" # Extremadura
ess1$region[missing_regunit_es1 & ess1$regioaes == 51] <- "ES51" # Cataluña
ess1$region[missing_regunit_es1 & ess1$regioaes == 52] <- "ES52" # Comunidad Valenciana
ess1$region[missing_regunit_es1 & ess1$regioaes == 53] <- "ES53" # Illes Balears
ess1$region[missing_regunit_es1 & ess1$regioaes == 61] <- "ES61" # Andalucía
ess1$region[missing_regunit_es1 & ess1$regioaes == 62] <- "ES62" # Región de Murcia
ess1$region[missing_regunit_es1 & ess1$regioaes == 63] <- "ES63" # Ciudad Autónoma de Ceuta
ess1$region[missing_regunit_es1 & ess1$regioaes == 64] <- "ES63" # Ciudad Autónoma de Melilla
ess1$region[missing_regunit_es1 & ess1$regioaes == 70] <- "ES70" # Canarias
ess1$region[missing_regunit_es1 & ess1$regioaes == 999] <- NA

# Spain # -> round 3 and 4 and NUTS level 2 (regiones) 
missing_regunit_es <- ess1$essround %in% c(1, 2) & ess1$cntry == "ES"
ess1$regunit[missing_regunit_es] <- 2 #Assign NUTS level 2
ess1$region[missing_regunit_es & ess1$regiones == 11] <- "ES11" # Galicia
ess1$region[missing_regunit_es & ess1$regiones == 12] <- "ES12" # Principado de Asturias
ess1$region[missing_regunit_es & ess1$regiones == 13] <- "ES13" # Cantabria
ess1$region[missing_regunit_es & ess1$regiones == 21] <- "ES21" # País Vasco
ess1$region[missing_regunit_es & ess1$regiones == 22] <- "ES22" # Comunidad Foral de Navarra
ess1$region[missing_regunit_es & ess1$regiones == 23] <- "ES23" # La Rioja
ess1$region[missing_regunit_es & ess1$regiones == 24] <- "ES24" # Aragón
ess1$region[missing_regunit_es & ess1$regiones == 30] <- "ES30" # Comunidad de Madrid
ess1$region[missing_regunit_es & ess1$regiones == 41] <- "ES41" # Castilla y León
ess1$region[missing_regunit_es & ess1$regiones == 42] <- "ES42" # Castilla-La Mancha
ess1$region[missing_regunit_es & ess1$regiones == 43] <- "ES43" # Extremadura
ess1$region[missing_regunit_es & ess1$regiones == 51] <- "ES51" # Cataluña
ess1$region[missing_regunit_es & ess1$regiones == 52] <- "ES52" # Comunidad Valenciana
ess1$region[missing_regunit_es & ess1$regiones == 53] <- "ES53" # Illes Balears
ess1$region[missing_regunit_es & ess1$regiones == 61] <- "ES61" # Andalucía
ess1$region[missing_regunit_es & ess1$regiones == 62] <- "ES62" # Región de Murcia
ess1$region[missing_regunit_es & ess1$regiones == 63] <- "ES63" # Ciudad Autónoma de Ceuta
ess1$region[missing_regunit_es & ess1$regiones == 64] <- "ES64" # Ciudad Autónoma de Melilla
ess1$region[missing_regunit_es & ess1$regiones == 70] <- "ES70" # Canarias
ess1$region[missing_regunit_es & ess1$regiones == 999] <- NA

# United Kingdom # -> round 1, 2, 3 and 4 for NUTS level 1
missing_regunit_gb <- is.na(ess1$regunit) & (ess1$essround %in% c(1, 2, 3, 4)) & (ess1$cntry == "GB")
ess1$regunit[missing_regunit_gb] <- 1 #Assign NUTS level 1
ess1$region[missing_regunit_gb & ess1$regiongb == 1] <- "UKC" # North East 
ess1$region[missing_regunit_gb & ess1$regiongb == 2] <- "UKD" # North West 
ess1$region[missing_regunit_gb & ess1$regiongb == 3] <- "UKE" # Yorkshire and The Humber 
ess1$region[missing_regunit_gb & ess1$regiongb == 4] <- "UKF" # East Midlands 
ess1$region[missing_regunit_gb & ess1$regiongb == 5] <- "UKG" # West Midlands 
ess1$region[missing_regunit_gb & ess1$regiongb == 6] <- "UKK" # South West  
ess1$region[missing_regunit_gb & ess1$regiongb == 7] <- "UKH" # East of England 
ess1$region[missing_regunit_gb & ess1$regiongb == 8] <- "UKI" # London 
ess1$region[missing_regunit_gb & ess1$regiongb == 9] <- "UKJ" # South East  
ess1$region[missing_regunit_gb & ess1$regiongb == 10] <- "UKL" # Wales
ess1$region[missing_regunit_gb & ess1$regiongb == 11] <- "UKM" # Scotland 
ess1$region[missing_regunit_gb & ess1$regiongb == 12] <- "UKN" # Northern Irland  
ess1$region[missing_regunit_gb & ess1$regiongb == 999] <- NA

# Slovakia # -> round 2, 3 and 4 for NUTS level 3 
missing_regunit_sk <- is.na(ess1$regunit) & (ess1$essround %in% c(2, 3, 4)) & (ess1$cntry == "SK")
ess1$regunit[missing_regunit_sk] <- 3 #Assign NUTS level 3
ess1$region[missing_regunit_sk & ess1$regionsk == 1] <- "SK010" # Bratislava Reg.
ess1$region[missing_regunit_sk & ess1$regionsk == 2] <- "SK021" # Trnava Reg.
ess1$region[missing_regunit_sk & ess1$regionsk == 3] <- "SK022" # Trencin Reg.
ess1$region[missing_regunit_sk & ess1$regionsk == 4] <- "SK023" # Nitra Reg.
ess1$region[missing_regunit_sk & ess1$regionsk == 5] <- "SK031" # Zilina Reg.
ess1$region[missing_regunit_sk & ess1$regionsk == 6] <- "SK032" # Banska Bystrica Reg.
ess1$region[missing_regunit_sk & ess1$regionsk == 7] <- "SK041" # Presov Reg.
ess1$region[missing_regunit_sk & ess1$regionsk == 8] <- "SK042" # Kosice Reg.
ess1$region[missing_regunit_sk & ess1$regionsk == 999] <- NA

# Sweden # -> round 1, 2, 3 and 4 for NUTS level 2 
missing_regunit_se <- is.na(ess1$regunit) & (ess1$essround %in% c(1, 2, 3, 4)) & (ess1$cntry == "SE")
ess1$regunit[missing_regunit_se] <- 2 #Assign NUTS level 2
ess1$region[missing_regunit_se & ess1$regionse == 1] <- "SE11" # Stockholm 
ess1$region[missing_regunit_se & ess1$regionse == 2] <- "SE12" # Östra Mellansverige 
ess1$region[missing_regunit_se & ess1$regionse == 3] <- "SE22" # Sydsverige 
ess1$region[missing_regunit_se & ess1$regionse == 4] <- "SE31" # Norra Mellansverige 
ess1$region[missing_regunit_se & ess1$regionse == 5] <- "SE32" # Mellersta Norrland 
ess1$region[missing_regunit_se & ess1$regionse == 6] <- "SE33" # Övre Norrland
ess1$region[missing_regunit_se & ess1$regionse == 7] <- "SE21" # Småland med Öarna
ess1$region[missing_regunit_se & ess1$regionse == 8] <- "SE23" # Västsverige

# Romania # -> round 3 and 4 for NUTS level 2 
missing_regunit_ro <- is.na(ess1$regunit) & (ess1$essround %in% c(3, 4)) & (ess1$cntry == "RO")
ess1$regunit[missing_regunit_ro] <- 2 #Assign NUTS level 2
ess1$region[missing_regunit_ro & ess1$regionro == 11] <- "RO11" # Nord-Vest 
ess1$region[missing_regunit_ro & ess1$regionro == 12] <- "RO12" # Centru 
ess1$region[missing_regunit_ro & ess1$regionro == 21] <- "RO21" # Nord-Est 
ess1$region[missing_regunit_ro & ess1$regionro == 22] <- "RO22" # Sud-Est 
ess1$region[missing_regunit_ro & ess1$regionro == 31] <- "RO31" # Sud-Muntenia
ess1$region[missing_regunit_ro & ess1$regionro == 32] <- "RO32" # Bucuresti-Ilfov
ess1$region[missing_regunit_ro & ess1$regionro == 41] <- "RO41" # Sud-Vest Oltenia
ess1$region[missing_regunit_ro & ess1$regionro == 42] <- "RO42" # vest

# Poland # -> round 1, 2, 3, and 4 for NUTS level 2
missing_regunit_pl <- is.na(ess1$regunit) & (ess1$essround %in% c(1, 2, 3, 4)) & (ess1$cntry == "PL")
ess1$regunit[missing_regunit_pl] <- 2 #Assign NUTS level 2
ess1$region[missing_regunit_pl & ess1$regionpl == 2] <- "PL51" # Dolnoslaskie
ess1$region[missing_regunit_pl & ess1$regionpl == 4] <- "PL61" # Kujawsko-pomorskie
ess1$region[missing_regunit_pl & ess1$regionpl == 6] <- "PL81" # Lubelskie
ess1$region[missing_regunit_pl & ess1$regionpl == 8] <- "PL43" # Lubelskie
ess1$region[missing_regunit_pl & ess1$regionpl == 10] <- "PL71" # Lodzkie
ess1$region[missing_regunit_pl & ess1$regionpl == 12] <- "PL21" # Malopolskie
ess1$region[missing_regunit_pl & ess1$regionpl == 14] <- "PL92" # Mazowieckie
ess1$region[missing_regunit_pl & ess1$regionpl == 16] <- "PL52" # Opolskie
ess1$region[missing_regunit_pl & ess1$regionpl == 18] <- "PL82" # Podkarpackie
ess1$region[missing_regunit_pl & ess1$regionpl == 20] <- "PL84" # Podlaskie
ess1$region[missing_regunit_pl & ess1$regionpl == 22] <- "PL63" # Pomorskie
ess1$region[missing_regunit_pl & ess1$regionpl == 24] <- "PL22" # Slaskie
ess1$region[missing_regunit_pl & ess1$regionpl == 26] <- "PL72" # Swietokrzyskie
ess1$region[missing_regunit_pl & ess1$regionpl == 28] <- "PL62" # Warminsko-mazurskie
ess1$region[missing_regunit_pl & ess1$regionpl == 30] <- "PL41" # Wielkopolskie
ess1$region[missing_regunit_pl & ess1$regionpl == 32] <- "PL42" # Zachodniopomorskie
ess1$region[missing_regunit_pl & ess1$regionpl == 999] <- NA

# Netherlands # -> round 1, 2, 3 and 4 for NUTS level 3 
missing_regunit_nl <- is.na(ess1$regunit) & (ess1$essround %in% c(1, 2, 3, 4)) & (ess1$cntry == "NL")
ess1$regunit[missing_regunit_nl] <- 3 #Assign NUTS level 3
ess1$region[missing_regunit_nl & ess1$regionnl == 111] <- "NL111" # Oost-Groningen
ess1$region[missing_regunit_nl & ess1$regionnl == 112] <- "NL112" # Delfzijl en omgeving
ess1$region[missing_regunit_nl & ess1$regionnl == 113] <- "NL113" # Overig Groningen
ess1$region[missing_regunit_nl & ess1$regionnl == 121] <- "NL124" # Noord-Friesland
ess1$region[missing_regunit_nl & ess1$regionnl == 122] <- "NL125" # Zuidwest-Friesland
ess1$region[missing_regunit_nl & ess1$regionnl == 123] <- "NL126" # Zuidoost-Friesland
ess1$region[missing_regunit_nl & ess1$regionnl == 131] <- "NL131" # Noord-Drenthe
ess1$region[missing_regunit_nl & ess1$regionnl == 132] <- "NL132" # Zuidoost-Drenthe
ess1$region[missing_regunit_nl & ess1$regionnl == 133] <- "NL133" # Zuidwest-Drenthe
ess1$region[missing_regunit_nl & ess1$regionnl == 211] <- "NL211" # Noord-Overijssel
ess1$region[missing_regunit_nl & ess1$regionnl == 212] <- "NL212" # Zuidwest-Overijssel
ess1$region[missing_regunit_nl & ess1$regionnl == 213] <- "NL213" # Twente
ess1$region[missing_regunit_nl & ess1$regionnl == 221] <- "NL221" # Veluwe
ess1$region[missing_regunit_nl & ess1$regionnl == 222] <- "NL225" # Achterhoek
ess1$region[missing_regunit_nl & ess1$regionnl == 223] <- "NL226" # Arnhem/Nijmegen
ess1$region[missing_regunit_nl & ess1$regionnl == 224] <- "NL224" # Zuidwest-Gelderland
ess1$region[missing_regunit_nl & ess1$regionnl == 230] <- "NL230" # Flevoland
ess1$region[missing_regunit_nl & ess1$regionnl == 310] <- "NL310" # Utrecht
ess1$region[missing_regunit_nl & ess1$regionnl == 321] <- "NL321" # Kop van Noord-Holland
ess1$region[missing_regunit_nl & ess1$regionnl == 322] <- "NL328" # Alkmaar en omgeving
ess1$region[missing_regunit_nl & ess1$regionnl == 323] <- "NL323" # IJmond
ess1$region[missing_regunit_nl & ess1$regionnl == 324] <- "NL324" # Agglomeratie Haarlem
ess1$region[missing_regunit_nl & ess1$regionnl == 325] <- "NL325" # Zaanstreek
ess1$region[missing_regunit_nl & ess1$regionnl == 326] <- "NL329" # Groot-Amsterdam
ess1$region[missing_regunit_nl & ess1$regionnl == 327] <- "NL327" # Het Gooi en Vechtstreek
ess1$region[missing_regunit_nl & ess1$regionnl == 331] <- "NL337" # Agglomeratie Leiden en Bollenstreek
ess1$region[missing_regunit_nl & ess1$regionnl == 332] <- "NL332" # Agglomeratie's-Gravenhage
ess1$region[missing_regunit_nl & ess1$regionnl == 333] <- "NL333" # Delft en Westland
ess1$region[missing_regunit_nl & ess1$regionnl == 334] <- "NL33B" # Oost-Zuid-Holland
ess1$region[missing_regunit_nl & ess1$regionnl == 335] <- "NL33C" # Groot-Rijnmond
ess1$region[missing_regunit_nl & ess1$regionnl == 336] <- "NL33A" # Zuidoost-Zuid-Holland
ess1$region[missing_regunit_nl & ess1$regionnl == 341] <- "NL341" # Zeeuwsch-Vlaanderen
ess1$region[missing_regunit_nl & ess1$regionnl == 342] <- "NL342" # Overig Zeeland
ess1$region[missing_regunit_nl & ess1$regionnl == 411] <- "NL411" # West-Noord-Brabant
ess1$region[missing_regunit_nl & ess1$regionnl == 412] <- "NL412" # Midden-Noord-Brabant
ess1$region[missing_regunit_nl & ess1$regionnl == 413] <- "NL413" # Noordoost-Noord-Brabant
ess1$region[missing_regunit_nl & ess1$regionnl == 414] <- "NL414" # Zuidoost-Noord-Brabant
ess1$region[missing_regunit_nl & ess1$regionnl == 421] <- "NL421" # Noord-Limburg
ess1$region[missing_regunit_nl & ess1$regionnl == 422] <- "NL422" # Midden-Limburg
ess1$region[missing_regunit_nl & ess1$regionnl == 423] <- "NL423" # Zuid-Limburg
ess1$region[missing_regunit_nl & ess1$regionnl == 999] <- NA

# Luxembourg # round 1 and 2 for NUTS 1
missing_regunit_lu <- is.na(ess1$regunit) & (ess1$essround %in% c(1, 2)) & (ess1$cntry == "LU")
ess1$regunit[missing_regunit_lu] <- 1 #Assign NUTS level 1
ess1$region[missing_regunit_lu & ess1$regionlu == 1] <- "LU0" 

# Latvia # -> round 3 and 4 for NUTS level 3 
missing_regunit_lv <- is.na(ess1$regunit) & (ess1$essround %in% c(3, 4)) & (ess1$cntry == "LV")
ess1$regunit[missing_regunit_lv] <- 3 #Assign NUTS level 3
ess1$region[missing_regunit_lv & ess1$regionlv == 1] <- "LV003" # Kurzeme
ess1$region[missing_regunit_lv & ess1$regionlv == 2] <- "LV005" # Latgale
ess1$region[missing_regunit_lv & ess1$regionlv == 3] <- "LV006" # Riga
ess1$region[missing_regunit_lv & ess1$regionlv == 4] <- "LV007" # Pieriga
ess1$region[missing_regunit_lv & ess1$regionlv == 5] <- "LV008" # Vidzeme
ess1$region[missing_regunit_lv & ess1$regionlv == 6] <- "LV009" # Zemgale
ess1$region[missing_regunit_lv & ess1$regionlv == 999] <- NA

# Lithuania # round 4 and NUTS level 3
missing_regunit_lt <- is.na(ess1$regunit) & (ess1$essround %in% c(4)) & (ess1$cntry == "LT")
ess1$regunit[missing_regunit_lt] <- 3 #Assign NUTS level 3
ess1$region[missing_regunit_lt & ess1$regionlt == 1] <- "LT021" #Alytus County
ess1$region[missing_regunit_lt & ess1$regionlt == 2] <- "LT022" #Kaunas county
ess1$region[missing_regunit_lt & ess1$regionlt == 3] <- "LT023" #Klaipėda county
ess1$region[missing_regunit_lt & ess1$regionlt == 4] <- "LT024" #Marijampolë county
ess1$region[missing_regunit_lt & ess1$regionlt == 5] <- "LT025" #Panevëþys county
ess1$region[missing_regunit_lt & ess1$regionlt == 6] <- "LT026" #Ðiauliai county
ess1$region[missing_regunit_lt & ess1$regionlt == 7] <- "LT027" #Tauragë county
ess1$region[missing_regunit_lt & ess1$regionlt == 8] <- "LT028" #Telðiai county
ess1$region[missing_regunit_lt & ess1$regionlt == 9] <- "LT029" #Utena county
ess1$region[missing_regunit_lt & ess1$regionlt == 10] <- "LT011" #Vilnius county
ess1$region[missing_regunit_lt & ess1$regionlt == 999] <- NA

# Italy # -> round 1 and 2 for NUTS 1
missing_regunit_it <- ess1$essround %in% c(1, 2) & ess1$cntry == "IT"
ess1$regunit[missing_regunit_it] <- 2 #Assign NUTS level 2
ess1$region[missing_regunit_it & ess1$regionit == 1] <- "ITC1" # Piemonte
ess1$region[missing_regunit_it & ess1$regionit == 2] <- "ITC2" # Valle d'Aosta
ess1$region[missing_regunit_it & ess1$regionit == 3] <- "ITC4" # Lombardia
ess1$region[missing_regunit_it & ess1$regionit == 4] <- "ITH2" # Trentino-Alto Adige
ess1$region[missing_regunit_it & ess1$regionit == 5] <- "ITH3" # Veneto
ess1$region[missing_regunit_it & ess1$regionit == 6] <- "ITH4" # Friuli-Venezia Giulia
ess1$region[missing_regunit_it & ess1$regionit == 7] <- "ITC3" # Liguria
ess1$region[missing_regunit_it & ess1$regionit == 8] <- "ITH5" # Emilia-Romagna
ess1$region[missing_regunit_it & ess1$regionit == 9] <- "ITI1" # Toscana
ess1$region[missing_regunit_it & ess1$regionit == 10] <- "ITI2" # Umbria
ess1$region[missing_regunit_it & ess1$regionit == 11] <- "ITI3" # Marche
ess1$region[missing_regunit_it & ess1$regionit == 12] <- "ITI4" # Lazio
ess1$region[missing_regunit_it & ess1$regionit == 13] <- "ITF1" # Abruzzo
ess1$region[missing_regunit_it & ess1$regionit == 14] <- "ITF2" # Molise
ess1$region[missing_regunit_it & ess1$regionit == 15] <- "ITF3" # Campania
ess1$region[missing_regunit_it & ess1$regionit == 16] <- "ITF4" # Puglia
ess1$region[missing_regunit_it & ess1$regionit == 17] <- "ITF5" # Basilicata
ess1$region[missing_regunit_it & ess1$regionit == 18] <- "ITF6" # Calabria
ess1$region[missing_regunit_it & ess1$regionit == 19] <- "ITG1" # Sicilia
ess1$region[missing_regunit_it & ess1$regionit == 20] <- "ITG2" # Sardegna

# Iceland # -> round 2 and nuts level 1
missing_regunit_is <- is.na(ess1$regunit) & (ess1$essround %in% c(2)) & (ess1$cntry == "IS")
ess1$regunit[missing_regunit_is] <- 1 #Assign NUTS level 1
ess1$region[missing_regunit_is & ess1$regionis == 1] <- "IS0"  

# Portugal # -> round 4 for NUTS level 2 (regioapt)
missing_regunit_pt <- is.na(ess1$regunit) & (ess1$essround %in% c(4)) & (ess1$cntry == "PT")
ess1$regunit[missing_regunit_pt] <- 2 #Assign NUTS level 2
ess1$region[missing_regunit_pt & ess1$regioapt == 1] <- "PT11" # Norte
ess1$region[missing_regunit_pt & ess1$regioapt == 2] <- "PT16" # Centro
ess1$region[missing_regunit_pt & ess1$regioapt == 3] <- "PT17" # Lisbon
ess1$region[missing_regunit_pt & ess1$regioapt == 4] <- "PT18" # Alentejo
ess1$region[missing_regunit_pt & ess1$regioapt == 5] <- "PT15" # Algarve
ess1$region[missing_regunit_pt & ess1$regioapt == 999] <- NA

# Portugal # -> round 1, 2 and 3 for NUTS level 2
missing_regunit_pt <- is.na(ess1$regunit) & (ess1$essround %in% c(1, 2, 3)) & (ess1$cntry == "PT")
ess1$regunit[missing_regunit_pt] <- 2 #Assign NUTS level 2
ess1$region[missing_regunit_pt & ess1$regionpt == 1] <- "PT11" # Norte
ess1$region[missing_regunit_pt & ess1$regionpt == 2] <- "PT16" # Centro
ess1$region[missing_regunit_pt & ess1$regionpt == 3] <- "PT17" # Lisbon e Vale do Tejo
ess1$region[missing_regunit_pt & ess1$regionpt == 4] <- "PT18" # Alentejo
ess1$region[missing_regunit_pt & ess1$regionpt == 5] <- "PT15" # Algarve
ess1$region[missing_regunit_pt & ess1$regionpt == 999] <- NA

# Finland # -> round 1 and NUTS level 2 
missing_regunit_fi1 <- is.na(ess1$regunit) & (ess1$essround %in% c(1)) & (ess1$cntry == "FI")
ess1$region[missing_regunit_fi1 & ess1$regionat == 1] <- "FI1B" # Uusimaa
ess1$region[missing_regunit_fi1 & ess1$regionat == 2] <- "FI1C" # Southern Finland and Åland
ess1$region[missing_regunit_fi1 & ess1$regionat == 3] <- "FI1D" # Eastern Finland 
ess1$region[missing_regunit_fi1 & ess1$regionat == 4] <- "FI19" # Mid Finland
ess1$region[missing_regunit_fi1 & ess1$regionat == 5] <- "FI1D" # Northern Finland
ess1$regunit[missing_regunit_fi1] <- 2 # Assign NUTS level


# Finland # -> round 2,3 and 4 and NUTS level 2 
missing_regunit_fi <- is.na(ess1$regunit) & (ess1$essround %in% c(2, 3, 4)) & (ess1$cntry == "FI")
ess1$region[missing_regunit_fi & ess1$regionat == 1] <- "FI1C" # Southern Finland 
ess1$region[missing_regunit_fi & ess1$regionat == 2] <- "FI19" # Western Finland 
ess1$region[missing_regunit_fi & ess1$regionat == 3] <- "FI1D" # Eastern Finland 
ess1$region[missing_regunit_fi & ess1$regionat == 5] <- "FI1D" # Northern Finland

ess1$region[missing_regunit_fi & ess1$regionat == 999] <- NA # not available
ess1$regunit[missing_regunit_fi] <- 2 # Assign NUTS level 2

# Greece # -> round 1, 2 NUTS level 2 
missing_regunit_gr1 <- is.na(ess1$regunit) & (ess1$essround %in% c(1, 2)) & (ess1$cntry == "GR")
ess1$regunit[missing_regunit_gr1] <- 2 # Assign NUTS level 2
ess1$region[missing_regunit_gr1 & ess1$regiongr == 3] <- "EL30" # Attiki
ess1$region[missing_regunit_gr1 & ess1$regiongr == 11] <- "EL51" # Anatoliki Makedonia, Thraki
ess1$region[missing_regunit_gr1 & ess1$regiongr == 12] <- "EL52" # Kentriki Makedonia
ess1$region[missing_regunit_gr1 & ess1$regiongr == 13] <- "EL53" # Dytiki Makedonia
ess1$region[missing_regunit_gr1 & ess1$regiongr == 14] <- "EL61" # Thessalia
ess1$region[missing_regunit_gr1 & ess1$regiongr == 21] <- "EL54" # Ipeiros
ess1$region[missing_regunit_gr1 & ess1$regiongr == 22] <- "EL62" # Ionia Nissia
ess1$region[missing_regunit_gr1 & ess1$regiongr == 23] <- "EL63" # Dytiki Ellada
ess1$region[missing_regunit_gr1 & ess1$regiongr == 24] <- "EL64" # Sterea Ellada
ess1$region[missing_regunit_gr1 & ess1$regiongr == 25] <- "EL65" # Peloponnisos
ess1$region[missing_regunit_gr1 & ess1$regiongr == 41] <- "EL41" # Voreio Agaio
ess1$region[missing_regunit_gr1 & ess1$regiongr == 42] <- "EL42" # Notio Agaio
ess1$region[missing_regunit_gr1 & ess1$regiongr == 43] <- "EL43" # Kriti

# Greece # -> round 4 NUTS level 2 (EL41 and EL62 is missing)
missing_regunit_gr <- is.na(ess1$regunit) & (ess1$essround %in% c(4)) & (ess1$cntry == "GR")
ess1$region[missing_regunit_gr & ess1$regioagr == 1] <- "EL51" # East Macedonia & Thrace
ess1$region[missing_regunit_gr & ess1$regioagr == 2] <- "EL52" # Central Macedonia/Thessaloniki
ess1$region[missing_regunit_gr & ess1$regioagr == 3] <- "EL53" # West Macedonia & Epirus
ess1$region[missing_regunit_gr & ess1$regioagr == 4] <- "EL61" # Thessalia
ess1$region[missing_regunit_gr & ess1$regioagr == 5] <- "EL63" # West Greece & Ionian Islands
ess1$region[missing_regunit_gr & ess1$regioagr == 6] <- "EL64" # Central Greece
ess1$region[missing_regunit_gr & ess1$regioagr == 7] <- "EL65" # Peloponnese
ess1$region[missing_regunit_gr & ess1$regioagr == 8] <- "EL30" # Attica/Athens
ess1$region[missing_regunit_gr & ess1$regioagr == 9] <- "EL42" # Aegean Islands
ess1$region[missing_regunit_gr & ess1$regioagr == 9] <- "EL43" # Crete
ess1$regunit[missing_regunit_gr] <- 2 # Assign NUTS level 2

# Slovenia # -> round 1, 2, 3 and 4 for NUTS level 3
missing_regunit_si <- is.na(ess1$regunit) & (ess1$essround %in% c(1, 2, 3, 4)) & (ess1$cntry == "SI")
ess1$regunit[missing_regunit_si] <- 3 #Assign NUTS level 3
ess1$region[missing_regunit_si & ess1$regionsi == 1] <- "SI042" # Gorenjska 
ess1$region[missing_regunit_si & ess1$regionsi == 2] <- "SI043" # Goriska 
ess1$region[missing_regunit_si & ess1$regionsi == 3] <- "SI037" # Jugovzhodna Slovenija 
ess1$region[missing_regunit_si & ess1$regionsi == 4] <- "SI033" # Koroska 
ess1$region[missing_regunit_si & ess1$regionsi == 5] <- "SI038" # Notranjsko-kraska
ess1$region[missing_regunit_si & ess1$regionsi == 6] <- "SI044" # Obalno-kraska 
ess1$region[missing_regunit_si & ess1$regionsi == 7] <- "SI041" # Osrednjeslovenska 
ess1$region[missing_regunit_si & ess1$regionsi == 8] <- "SI032" # Podravska 
ess1$region[missing_regunit_si & ess1$regionsi == 9] <- "SI031" # Pomurska 
ess1$region[missing_regunit_si & ess1$regionsi == 10] <- "SI034" # Savinjska 
ess1$region[missing_regunit_si & ess1$regionsi == 11] <- "SI036" # Spodnjeposavska (Posavska)
ess1$region[missing_regunit_si & ess1$regionsi == 12] <- "SI035" # Zasavska

# Croatia # -> round 4 and NUTS level 2
missing_regunit_hr <- is.na(ess1$regunit) & (ess1$essround %in% c(4)) & (ess1$cntry == "HR")
ess1$regunit[missing_regunit_hr] <- 3 #Assign NUTS level 2
ess1$region[missing_regunit_hr & ess1$regionhr == 1] <- "HR04" # Sjeverozapadna Hrvatska
ess1$region[missing_regunit_hr & ess1$regionhr == 2] <- "HR04" #Središnja i Istočna Hrvatska / Panonska Hrvatska
ess1$region[missing_regunit_hr & ess1$regionhr == 3] <- "HR03" # Jadranska Hrvatska ((Jadranska Hrvatska)
ess1$region[missing_regunit_hr & ess1$regionhr == 999] <- NA # Not available 

# Norway # -> round 1, 2, 3 and 4 for NUTS level 2
missing_regunit_no <- is.na(ess1$regunit) & (ess1$essround %in% c(1, 2, 3, 4)) & (ess1$cntry == "NO")
ess1$regunit[missing_regunit_no] <- 2 #Assign NUTS level 2
ess1$region[missing_regunit_no & ess1$regionno == 1] <- "NO01" # Oslo and Akershus
ess1$region[missing_regunit_no & ess1$regionno == 2] <- "NO02" # Hedmark and Oppland
ess1$region[missing_regunit_no & ess1$regionno == 3] <- "NO03" # South Eastern Norway 
ess1$region[missing_regunit_no & ess1$regionno == 4] <- "NO04" # Agder and Rogaland
ess1$region[missing_regunit_no & ess1$regionno == 5] <- "NO05" # Western Norway (Vestlandet)
ess1$region[missing_regunit_no & ess1$regionno == 6] <- "NO06" # Trøndelag
ess1$region[missing_regunit_no & ess1$regionno == 7] <- "NO07" # Northern Norway
ess1$region[missing_regunit_no & ess1$regionno == 999] <- NA

# Hungary # -> round 1, 2, 3 and 4 and NUTS level 2 
missing_regunit_hu <- is.na(ess1$regunit) & (ess1$essround %in% c(1, 2, 3, 4)) & (ess1$cntry == "HU")
ess1$regunit[missing_regunit_hu] <- 2 #Assign NUTS level 2
ess1$region[missing_regunit_hu & ess1$regionhu == 1] <- "HU10" # Central regio
ess1$region[missing_regunit_hu & ess1$regionhu == 2] <- "HU21" # Middle- Transdanubia
ess1$region[missing_regunit_hu & ess1$regionhu == 3] <- "HU22" # West- Transdanubia
ess1$region[missing_regunit_hu & ess1$regionhu == 4] <- "HU23" # South-Transdanubia
ess1$region[missing_regunit_hu & ess1$regionhu == 5] <- "HU31" # North Regio
ess1$region[missing_regunit_hu & ess1$regionhu == 6] <- "HU32" # North- Plain
ess1$region[missing_regunit_hu & ess1$regionhu == 7] <- "HU33" # South- Plain
ess1$region[missing_regunit_hu & ess1$regionhu == 999] <- NA 

# Ireland # -> round 1 and 2 for NUTS level 2 (regionie) 
missing_regunit_ie1 <- ess1$essround %in% c(1, 2) & ess1$cntry == "IE"
ess1$regunit[missing_regunit_ie1] <- 2 #Assign NUTS level 2
ess1$region[missing_regunit_ie1 & ess1$regionie == 1] <- "IE04" # Border
ess1$region[missing_regunit_ie1 & ess1$regionie == 2] <- "IE06" # Midland
ess1$region[missing_regunit_ie1 & ess1$regionie == 3] <- "IE04" # West
ess1$region[missing_regunit_ie1 & ess1$regionie == 4] <- "IE06" # Dublin
ess1$region[missing_regunit_ie1 & ess1$regionie == 5] <- "IE06" # Mid-East
ess1$region[missing_regunit_ie1 & ess1$regionie == 6] <- "IE05" # Mid-West
ess1$region[missing_regunit_ie1 & ess1$regionie == 7] <- "IE05" # South East 
ess1$region[missing_regunit_ie1 & ess1$regionie == 8] <- "IE05" # South West 
ess1$region[missing_regunit_ie1 & ess1$regionie == 999] <- NA

### Now remove some some observations because they do not correspond to the official NUTS categories 
## Ireland -> there is a separate category for dublin in the ESS and this does not correspond to the official NUTS regions 
ess1 <- ess1[!(ess1$cntry == "IE" & ess1$essround %in% c(3, 4)), ]

## Finland -> 2 regions are combined in the ESS wave 1, 2, 3 and 4
ess1 <- ess1[!(ess1$cntry == "FI" & ess1$essround %in% c( 2, 3, 4)), ]

## France
# not all NUTS level 2 regions are available in the ESS dataset, but I keep it to have more observations


## Greece
# Although not all NUTS level 2 regions are available in the ESS dataset, I decided to keep it to have more observations


# Adjust the missing values for the variable "regunit" in round number 10 
# Missing values for regunit 
missing_regunit_at <- is.na(ess1$regunit) & (ess1$essround %in% c(10)) & (ess1$cntry == "AT")
ess1$regunit[missing_regunit_at]<- 2 

# Cyprus 
missing_regunit_cy <- is.na(ess1$regunit) & (ess1$essround %in% c(10)) & (ess1$cntry == "CY")
ess1$regunit[missing_regunit_cy]<- 1

# Germany 
missing_regunit_de <- is.na(ess1$regunit) & (ess1$essround %in% c(10)) & (ess1$cntry == "DE")
ess1$regunit[missing_regunit_de]<- 1

# Spain 
missing_regunit_es <- is.na(ess1$regunit) & (ess1$essround %in% c(10)) & (ess1$cntry == "ES")
ess1$regunit[missing_regunit_es]<- 2

# Latvia 
missing_regunit_lv <- is.na(ess1$regunit) & (ess1$essround %in% c(10)) & (ess1$cntry == "LV")
ess1$regunit[missing_regunit_lv]<- 3

# Poland 
missing_regunit_pl <- is.na(ess1$regunit) & (ess1$essround %in% c(10)) & (ess1$cntry == "PL")
ess1$regunit[missing_regunit_pl]<- 2

# Sweden 
missing_regunit_se <- is.na(ess1$regunit) & (ess1$essround %in% c(10)) & (ess1$cntry == "SE")
ess1$regunit[missing_regunit_se]<- 2

# Remove some values from the environment in order to have more memory  
remove(missing_regunit_at, missing_regunit_be, missing_regunit_bg, 
       missing_regunit_ch, missing_regunit_cy, missing_regunit_cz, 
       missing_regunit_de, missing_regunit_dk, missing_regunit_ee, missing_regunit_es, 
       missing_regunit_es1, missing_regunit_sk, missing_regunit_si, missing_regunit_se, 
       missing_regunit_ro, missing_regunit_pt, missing_regunit_pl, missing_regunit_no, 
       missing_regunit_nl, missing_regunit_lv, missing_regunit_lu, missing_regunit_lt, 
       missing_regunit_it, missing_regunit_is, missing_regunit_ie1, missing_regunit_hu, 
       missing_regunit_hr,missing_regunit_fi, missing_regunit_fi1, missing_regunit_gb, 
       missing_regunit_gr, missing_regunit_gr1, missing_values)

# remove the columns from my dataset
ess1 <- ess1[, !(names(ess1) %in% c("regionat", "regionbe", "regionbg", "regioncy", 
                                    "regioncz", "regionde", "regiondk", "regionee", "regiones", 
                                    "regionfi", "regionfr", "regiongb", "regiongr", "regionhr", 
                                    "regionhu", "regionie", "regionis", "regionit", "regionlt", 
                                    "regionlu", "regionlv", "regionnl", "regionno", "regionpl", 
                                    "regionpt", "regionro", "regionse", "regionsi", "regionsk", 
                                    "regioach", "regioacz", "regioadk", "regiobie", "regioaes", 
                                    "regioafi", "regioapt", "regioagr", "regioaie"))]

# Remove Norway round 10 from the analysis 
ess1 <- ess1[!(ess1$cntry == "NO" & ess1$essround %in% c(10)), ]


## Now i need to make sure that the region specification corresponds to the official NUTS classification in 2016! 
# no adjustments for Austria, Belgium, Bulgaria, Switzerland, Cyprus, Czech Republic, Germany, Island, Spain, 
# United Kingdom, Italy, Luxembourg, Latvia, Netherlands, Portugal, Romania, Sweden, Slovakia 

# adjustments for Denmark, Estonia, Finland, France, Finland, Greece, Croatia, Hungary, Lithuania, Norway, Poland, Slovenia and Ireland 

## Estonia ##-> adjust different NUTS levels used in wave 10 
adjust_regunit_ee10 <- (ess1$essround %in% c(10)) & (ess1$cntry == "EE")
ess1$region[adjust_regunit_ee10 & ess1$region == "EE001"] <- "EE001" #Pöhja-Eesti 
ess1$region[adjust_regunit_ee10 & ess1$region == "EE004"] <- "EE004" #Lääne-Eesti
ess1$region[adjust_regunit_ee10 & ess1$region == "EE008"] <- "EE008" # Löouna-Eesti
ess1$region[adjust_regunit_ee10 & ess1$region == "EE009"] <- "EE006" #Kesk-eesti
ess1$region[adjust_regunit_ee10 & ess1$region == "EE00A"] <- "EE007" #Kirde-Eesti

## Ireland ## -> adjust different NUTS levels used in wave 5,6,7 and 8
adjust_regunit_ie <- (ess1$essround %in% c(5, 6, 7, 8)) & (ess1$cntry == "IE")
ess1$region[adjust_regunit_ie & ess1$region == "IE011"] <- "IE04" # Border
ess1$region[adjust_regunit_ie & ess1$region == "IE012"] <- "IE06" # Midland
ess1$region[adjust_regunit_ie & ess1$region == "IE013"] <- "IE04" # West
ess1$region[adjust_regunit_ie & ess1$region == "IE021"] <- "IE06" # Dublin
ess1$region[adjust_regunit_ie & ess1$region == "IE022"] <- "IE06" # Mid_East
ess1$region[adjust_regunit_ie & ess1$region == "IE023"] <- "IE05" # Mid_West
ess1$region[adjust_regunit_ie & ess1$region == "IE024"] <- "IE05" # South East
ess1$region[adjust_regunit_ie & ess1$region == "IE025"] <- "IE05" # South West

## Finland -> adjust different NUTS levels used in ESS round 5 
adjust_regunit_fi <- (ess1$essround %in% c(5)) & (ess1$cntry == "FI")
ess1$region[adjust_regunit_fi & ess1$region == "FI131"] <- "FI1D1" # Etelä-Savo
ess1$region[adjust_regunit_fi & ess1$region == "FI132"] <- "FI1D2" # Pohjois-Savo
ess1$region[adjust_regunit_fi & ess1$region == "FI133"] <- "FI1D3" # Pohjois-Karjala
ess1$region[adjust_regunit_fi & ess1$region == "FI134"] <- "FI1D8" # Kainuu
ess1$region[adjust_regunit_fi & ess1$region == "FI181"] <- "FI1B1" # Helsinki-Uusimaa
ess1$region[adjust_regunit_fi & ess1$region == "FI182"] <- "FI1B1" # Itä-Uusimaa
ess1$region[adjust_regunit_fi & ess1$region == "FI183"] <- "FI1C1" # Varsinais-Suomi
ess1$region[adjust_regunit_fi & ess1$region == "FI184"] <- "FI1C2" # Kanta-Häme
ess1$region[adjust_regunit_fi & ess1$region == "FI185"] <- "FI1C3" # Päijät-Häme
ess1$region[adjust_regunit_fi & ess1$region == "FI186"] <- "FI1C4" # Kymenlaakso
ess1$region[adjust_regunit_fi & ess1$region == "FI187"] <- "FI1C5" # Etelä-Karjala
ess1$region[adjust_regunit_fi & ess1$region == "FI1A1"] <- "FI1D5" # Keski-Pohjanmaa
ess1$region[adjust_regunit_fi & ess1$region == "FI1A2"] <- "FI1D9" # Pohjois-Pohjanmaa 
ess1$region[adjust_regunit_fi & ess1$region == "FI1A3"] <- "FI1D7" # Lappi

adjust_regunit_fi1 <- (ess1$essround %in% c(6,7,8, 9, 10)) & (ess1$cntry == "FI")
ess1$region[adjust_regunit_fi1 & ess1$region == "FI1D6"] <- "FI1D9"
ess1$region[adjust_regunit_fi1 & ess1$region == "FI1D4"] <- "FI1D8"

## Greece -> adjust different NUTS levels used in ess round 5 
adjust_regunit_gr <- (ess1$essround %in% c(5)) & (ess1$cntry == "GR")
ess1$region[adjust_regunit_gr & ess1$region == "GR11"] <- "EL51"# Anatoliki Makedonia & Thraki
ess1$region[adjust_regunit_gr & ess1$region == "GR12"] <- "EL52"# Kentriki Makedonia
ess1$region[adjust_regunit_gr & ess1$region == "GR13"] <- "EL53"# 
ess1$region[adjust_regunit_gr & ess1$region == "GR14"] <- "EL61"# 
ess1$region[adjust_regunit_gr & ess1$region == "GR41"] <- "EL41"# 
ess1$region[adjust_regunit_gr & ess1$region == "GR21"] <- "EL54"# 
ess1$region[adjust_regunit_gr & ess1$region == "GR22"] <- "EL62"# 
ess1$region[adjust_regunit_gr & ess1$region == "GR23"] <- "EL63"# 
ess1$region[adjust_regunit_gr & ess1$region == "GR24"] <- "EL64"# 
ess1$region[adjust_regunit_gr & ess1$region == "GR25"] <- "EL65"# 
ess1$region[adjust_regunit_gr & ess1$region == "GR30"] <- "EL30"# 
ess1$region[adjust_regunit_gr & ess1$region == "GR42"] <- "EL42"# 
ess1$region[adjust_regunit_gr & ess1$region == "GR43"] <- "EL43"# 


## France ## -> adjust different NUTS levels used in round 5, 6 and 7 
# France 
adjust_regunit_fr <- (ess1$essround %in% c(5, 6, 7)) & (ess1$cntry == "FR")
ess1$region[adjust_regunit_fr & ess1$region == "FR10"] <- "FR10" #Île de France
ess1$region[adjust_regunit_fr & ess1$region == "FR24"] <- "FRB0" #Centre
ess1$region[adjust_regunit_fr & ess1$region == "FR26"] <- "FRC1" #Bourgogne
ess1$region[adjust_regunit_fr & ess1$region == "FR43"] <- "FRC2" #Franche-Comté
ess1$region[adjust_regunit_fr & ess1$region == "FR25"] <- "FRD1" #Basse-Normandie
ess1$region[adjust_regunit_fr & ess1$region == "FR23"] <- "FRD2" #Haute-Normandie
ess1$region[adjust_regunit_fr & ess1$region == "FR30"] <- "FRE1" #Nord - Pas-de-Calais
ess1$region[adjust_regunit_fr & ess1$region == "FR22"] <- "FRE2" #Picardie
ess1$region[adjust_regunit_fr & ess1$region == "FR42"] <- "FRF1" #Alsace
ess1$region[adjust_regunit_fr & ess1$region == "FR21"] <- "FRF2" #Champagne-Ardenne
ess1$region[adjust_regunit_fr & ess1$region == "FR41"] <- "FRF3" #Lorraine
ess1$region[adjust_regunit_fr & ess1$region == "FR51"] <- "FRG0" #Pays de la Loire
ess1$region[adjust_regunit_fr & ess1$region == "FR52"] <- "FRH0" #Bretagne
ess1$region[adjust_regunit_fr & ess1$region == "FR61"] <- "FRI1" #Aquitaine
ess1$region[adjust_regunit_fr & ess1$region == "FR63"] <- "FRI2" #Limousin
ess1$region[adjust_regunit_fr & ess1$region == "FR53"] <- "FRI3" #Poitou-Charentes
ess1$region[adjust_regunit_fr & ess1$region == "FR81"] <- "FRJ1" #Languedoc-Roussillon
ess1$region[adjust_regunit_fr & ess1$region == "FR62"] <- "FRJ2" #Midi-Pyrénées
ess1$region[adjust_regunit_fr & ess1$region == "FR72"] <- "FRK1" #Auvergne
ess1$region[adjust_regunit_fr & ess1$region == "FR71"] <- "FRK2" #Rhône-Alpes
ess1$region[adjust_regunit_fr & ess1$region == "FR82"] <- "FRL0" #Provence-Alpes-Côte d'Azur
ess1$region[adjust_regunit_fr & ess1$region == "FR83"] <- "FRM0" #Corse 
ess1$region[adjust_regunit_fr & ess1$region == "FRA1"] <- "FRY1" #Guadeloupe
ess1$region[adjust_regunit_fr & ess1$region == "FRA2"] <- "FRY2" #Martinique
ess1$region[adjust_regunit_fr & ess1$region == "FRA3"] <- "FRY3" #Guyane
ess1$region[adjust_regunit_fr & ess1$region == "FRA4"] <- "FRY4" #La Réunion
ess1$region[adjust_regunit_fr & ess1$region == "FRA5"] <- "FRY5" #Moyotte 


# Lithuania -> adjust different NUTS levels used in some ess round 5, 6, 7, 8 and 9 
adjust_regunit_lt <- (ess1$essround %in% c(5, 6, 7, 8, 9)) & (ess1$cntry == "LT")
ess1$region[adjust_regunit_lt & ess1$region == "LT001"] <- "LT021" # Alytaus apskritis
ess1$region[adjust_regunit_lt & ess1$region == "LT002"] <- "LT022" # Kauno apskritis
ess1$region[adjust_regunit_lt & ess1$region == "LT003"] <- "LT023" # Klaipedos apskritis
ess1$region[adjust_regunit_lt & ess1$region == "LT004"] <- "LT024" # Marijampoles apskritis
ess1$region[adjust_regunit_lt & ess1$region == "LT005"] <- "LT025" # Panevežio apskritis
ess1$region[adjust_regunit_lt & ess1$region == "LT006"] <- "LT026" # Šiauliu apskritis
ess1$region[adjust_regunit_lt & ess1$region == "LT007"] <- "LT027" # Taurages apskritis
ess1$region[adjust_regunit_lt & ess1$region == "LT008"] <- "LT028" # Telšiu apskritis
ess1$region[adjust_regunit_lt & ess1$region == "LT009"] <- "LT029" # Utenos apskritis
ess1$region[adjust_regunit_lt & ess1$region == "LT00A"] <- "LT011" # Vilnius apskritis

# Poland -> adjust different NUTS levels used in ess round 5, 6, 7, 8 and 9  
adjust_regunit_pl <- (ess1$essround %in% c(5, 6, 7, 8, 9)) & (ess1$cntry == "PL")
ess1$region[adjust_regunit_pl & ess1$region == "PL11"] <- "PL71" # Łódzkie 
ess1$region[adjust_regunit_pl & ess1$region == "PL12"] <- "PL92" # Mazowieckie 
ess1$region[adjust_regunit_pl & ess1$region == "PL31"] <- "PL81" # Lubelskie 
ess1$region[adjust_regunit_pl & ess1$region == "PL32"] <- "PL82" # Podkarpackie 
ess1$region[adjust_regunit_pl & ess1$region == "PL33"] <- "PL72" # Swietokrzyskie 
ess1$region[adjust_regunit_pl & ess1$region == "PL34"] <- "PL84" # Podlaskie 

# Slovenia -> adjust different NUTS levels used in ess round 5, 6, 7, 8 and 9 
adjust_regunit_si <- (ess1$essround %in% c(5, 6, 7, 8, 9)) & (ess1$cntry == "SI")
ess1$region[adjust_regunit_si & ess1$region == "SI011"] <- "SI031" # Pomurska 
ess1$region[adjust_regunit_si & ess1$region == "SI012"] <- "SI032" # Podravska
ess1$region[adjust_regunit_si & ess1$region == "SI013"] <- "SI033" # Koroška 
ess1$region[adjust_regunit_si & ess1$region == "SI014"] <- "SI034" # savinjska statistična regija 
ess1$region[adjust_regunit_si & ess1$region == "SI015"] <- "SI035" # zasavska statistična regija 
ess1$region[adjust_regunit_si & ess1$region == "SI016"] <- "SI036" # Spodnjeposavska statistična regija 
ess1$region[adjust_regunit_si & ess1$region == "SI021"] <- "SI041" # osrednjeslovenska  regija 
ess1$region[adjust_regunit_si & ess1$region == "SI017"] <- "SI037" # Jugovzhodna Slovenija
ess1$region[adjust_regunit_si & ess1$region == "SI018"] <- "SI038" # Notranjsko-kraška
ess1$region[adjust_regunit_si & ess1$region == "SI022"] <- "SI042" # Gorenjska
ess1$region[adjust_regunit_si & ess1$region == "SI023"] <- "SI043" # Goriška
ess1$region[adjust_regunit_si & ess1$region == "SI024"] <- "SI044" # Obalno-kraška


# Croatia -> adjust different NUTS level used in ess round 10 
adjust_regunit_hr <- (ess1$essround %in% c(10)) & (ess1$cntry == "HR")
ess1$region[adjust_regunit_hr & ess1$region == "HR028"] <- "HR04E" #  
ess1$region[adjust_regunit_hr & ess1$region == "HR027"] <- "HR04D" #  
ess1$region[adjust_regunit_hr & ess1$region == "HR026"] <- "HR04C" # 
ess1$region[adjust_regunit_hr & ess1$region == "HR025"] <- "HR04B" #  
ess1$region[adjust_regunit_hr & ess1$region == "HR024"] <- "HR04A" #  
ess1$region[adjust_regunit_hr & ess1$region == "HR023"] <- "HR049" #  
ess1$region[adjust_regunit_hr & ess1$region == "HR022"] <- "HR048" #  
ess1$region[adjust_regunit_hr & ess1$region == "HR021"] <- "HR047" #  
ess1$region[adjust_regunit_hr & ess1$region == "HR061"] <- "HR046" #  
ess1$region[adjust_regunit_hr & ess1$region == "HR063"] <- "HR045" #  
ess1$region[adjust_regunit_hr & ess1$region == "HR062"] <- "HR044" #  
ess1$region[adjust_regunit_hr & ess1$region == "HR064"] <- "HR043" #
ess1$region[adjust_regunit_hr & ess1$region == "HR065"] <- "HR042" #
ess1$region[adjust_regunit_hr & ess1$region == "HR050"] <- "HR041" #
ess1$region[adjust_regunit_hr & ess1$region == "HR037"] <- "HR037" #
ess1$region[adjust_regunit_hr & ess1$region == "HR036"] <- "HR036" #
ess1$region[adjust_regunit_hr & ess1$region == "HR035"] <- "HR035" #
ess1$region[adjust_regunit_hr & ess1$region == "HR034"] <- "HR034" #
ess1$region[adjust_regunit_hr & ess1$region == "HR033"] <- "HR033" #
ess1$region[adjust_regunit_hr & ess1$region == "HR031"] <- "HR031" #

# Hungary -> remove region HU10 from round 1, 2, 3, and 4 
ess1 <- ess1 %>%
  filter(!(essround %in% c(1, 2, 3, 4) & region == "HU10"))
# ess1 <- ess1 %>% mutate(region = if_else(ess_round %in% c(1, 2, 3, 4) & region == "HU10", NA_character_, region))

# Hungary 
adjust_regunit_hu <- (ess1$essround %in% c(5, 6, 7, 8)) & (ess1$cntry == "HU")
ess1$region[adjust_regunit_hu & ess1$region == "HU101"] <- "HU110" #
ess1$region[adjust_regunit_hu & ess1$region == "HU102"] <- "HU120" #


### Create NUTS columns 
# NUTS 1 # 
ess1$nuts2 <- ifelse(ess1$regunit == 2, ess1$region, NA)

# No I want to consider the nuts levels in the dataset # 
ess1$nuts3 <- ifelse(ess1$regunit == 3, ess1$region, NA)

ess1$nuts1 <- ifelse(ess1$regunit == 1, ess1$region, NA)


ess1$nuts1 <- case_when(
  ess1$regunit == 1 ~ ess1$region,
  ess1$cntry == "AT" & ess1$regunit == 2 ~ substr(ess1$region, 1, nchar(ess1$region) - 1),
  ess1$cntry == "BE" & ess1$regunit == 2 ~ substr(ess1$region, 1, nchar(ess1$region) - 1),
  ess1$cntry == "BG" & ess1$regunit == 3 ~ substr(ess1$region, 1, nchar(ess1$region) - 2),
  ess1$cntry == "CY" & ess1$regunit == 2 ~ "CY0",
  ess1$cntry == "CZ" & ess1$regunit == 2 ~ substr(ess1$region, 1, nchar(ess1$region) - 1),
  ess1$cntry == "CH" & ess1$regunit == 2 ~ substr(ess1$region, 1, nchar(ess1$region) - 1),
  ess1$cntry == "DK" & (ess1$regunit == 2 | ess1$regunit == 3) ~ "DK0",
  ess1$cntry == "EE" & ess1$regunit == 3 ~ "EE0",
  ess1$cntry == "ES" & ess1$regunit == 2 ~ substr(ess1$region, 1, nchar(ess1$region) - 1),
  ess1$cntry == "FI" & ess1$regunit == 3 ~ substr(ess1$region, 1, nchar(ess1$region) - 2),
  ess1$cntry == "FR" & ess1$regunit == 2 ~ substr(ess1$region, 1, nchar(ess1$region) - 1),
  ess1$cntry == "GR" & ess1$regunit == 2 ~ substr(ess1$region, 1, nchar(ess1$region) - 1),
  ess1$cntry == "HR" & (ess1$regunit == 2 | ess1$regunit == 3) ~ "HR0",
  ess1$cntry == "HU" & ess1$regunit == 2 ~ substr(ess1$region, 1, nchar(ess1$region) - 1),
  ess1$cntry == "HU" & ess1$regunit == 3 ~ substr(ess1$region, 1, nchar(ess1$region) - 2),
  ess1$cntry == "IE" ~ "IE0",
  ess1$cntry == "IS" ~ "IS0",
  ess1$cntry == "IT" & ess1$regunit == 2 ~ substr(ess1$region, 1, nchar(ess1$region) - 1),
  ess1$cntry == "LT" ~ "LT0",
  ess1$cntry == "LU" ~ "LU0",
  ess1$cntry == "LV" ~ "LV0",
  ess1$cntry == "NL" & ess1$regunit == 2 ~ substr(ess1$region, 1, nchar(ess1$region) - 1), 
   ess1$cntry == "NL" & ess1$regunit == 3 ~ substr(ess1$region, 1, nchar(ess1$region) - 2), 
  ess1$cntry == "NO" ~ "NO0",
  ess1$cntry == "PL" & ess1$regunit == 2 ~ substr(ess1$region, 1, nchar(ess1$region) - 1),
  ess1$cntry == "PT" & ess1$regunit == 2 ~ substr(ess1$region, 1, nchar(ess1$region) - 1),
  ess1$cntry == "RO" & ess1$regunit == 2 ~ substr(ess1$region, 1, nchar(ess1$region) - 1),
  ess1$cntry == "SE" & ess1$regunit == 3 ~ substr(ess1$region, 1, nchar(ess1$region) - 2),
  ess1$cntry == "SI" ~ "SI0",
  ess1$cntry == "SK" ~ "SK0",
  TRUE ~ NA_character_
)


# Replace 99999 with NA
ess1$region[ess1$region == 99999] <- NA

# NUTS 2 # 
ess1$nuts2 <- case_when(
  ess1$regunit == 2 ~ ess1$region,
  ess1$cntry == "BG" & ess1$regunit == 3 ~ substr(ess1$region, 1, nchar(ess1$region) - 1),
  ess1$cntry == "CZ" & ess1$regunit == 3 ~ substr(ess1$region, 1, nchar(ess1$region) - 1),
  ess1$cntry == "DK" & ess1$regunit == 3 ~ substr(ess1$region, 1, nchar(ess1$region) - 1),
  ess1$cntry == "EE" & ess1$regunit == 3 ~ "EE00",
  ess1$cntry == "FI" & ess1$regunit == 3 ~ substr(ess1$region, 1, nchar(ess1$region) - 1),
  ess1$cntry == "HU" & ess1$regunit == 3 ~ substr(ess1$region, 1, nchar(ess1$region) - 1),
  ess1$cntry == "HR" & ess1$regunit == 3 ~ substr(ess1$region, 1, nchar(ess1$region) - 1),
  ess1$cntry == "IE" & ess1$regunit == 3 ~ ess1$region,
  ess1$cntry == "LT" & ess1$regunit == 3 ~ substr(ess1$region, 1, nchar(ess1$region) - 1),
  ess1$cntry == "SE" & ess1$regunit == 3 ~ substr(ess1$region, 1, nchar(ess1$region) - 1),
  ess1$cntry == "LV" & ess1$regunit == 3 ~ "LV00",
  ess1$cntry == "NL" & ess1$regunit == 3 ~ substr(ess1$region, 1, nchar(ess1$region) - 1),
  ess1$cntry == "SI" & ess1$regunit == 3 ~ substr(ess1$region, 1, nchar(ess1$region) - 1),
  ess1$cntry == "SK" & ess1$regunit == 3 ~ substr(ess1$region, 1, nchar(ess1$region) - 1),
  ess1$cntry == "LU" & ess1$regunit == 1 ~ "LU00",
ess1$cntry == "IS" & ess1$regunit == 1 ~ "IS00",
ess1$cntry == "IS" & ess1$regunit == 3 ~ "IS00",
ess1$cntry == "CY" & ess1$regunit == 1 ~ "CY00",
ess1$cntry == "DE" ~ ess1$nuts1,
ess1$cntry == "GB" ~ ess1$nuts1,
ess1$cntry == "BE" & ess1$regunit == 1 ~ ess1$nuts1,
ess1$cntry == "IT" & ess1$regunit == 1 ~ ess1$nuts1,
TRUE ~ NA_character_
)

ess1$region[ess1$region == 99999] <- NA

#ess1$nuts2[ess1$regunit == 1] <- ess1$region[ess1$regunit == 1]

# There is some missing data (7 cases) with missing data on the column nuts. I will remove it
ess1 <- ess1[!is.na(ess1$nuts2), ]

# Some final adjustments 
names(ess1)[names(ess1) == "region"] <- "nuts"

library(eurostat)

# Download the dataset from the eurostat package 
geo_data <- eurostat_geodata_60_2016

# change the column name of geo_data for the merging process 
names(geo_data)[names(geo_data) == "NUTS_ID"] <- "nuts"

# only keep the relevant data in the dataset 
geo_data <- geo_data[, c("nuts", "NAME_LATN", "geometry")]

ess1 <- merge(ess1, geo_data, by = "nuts", all.x = TRUE)

remove(geo_data)

#Change the name of column 'region' to 'nuts'
names(ess1)[names(ess1) == "regunit"] <- "nuts_level"

# Convert all characters to lowercase and capitalise the first letter of each word
ess1$NAME_LATN <- tools::toTitleCase(tolower(ess1$NAME_LATN)) # there is no missing value 

# Now create a variable for ess_census 
ess1 <- ess1 %>%
  mutate(year_census = case_when(
    ess_year >= 2000 & ess_year <= 2009 ~ 2000,
    ess_year >= 2010 & ess_year <= 2019 ~ 2010,
    ess_year == 2020 ~ 2020,
    TRUE ~ NA_real_  # This will assign NA to any year outside the specified ranges
  ))

# Recode the NUTS classifications for Ireland 
ess1 <- ess1 %>%
  mutate(nuts2 = case_when(
    nuts2 == "IE041" ~ "IE04",     
    nuts2 == "IE042" ~ "IE04", 
    nuts2 == "IE051" ~ "IE05", 
    nuts2 == "IE052" ~ "IE05", 
    nuts2 == "IE053" ~ "IE05", 
    nuts2 == "IE061" ~ "IE06", 
    nuts2 == "IE062" ~ "IE06", 
    nuts2 == "IE063" ~ "IE06", 
    # Change IE041 to IE04
    TRUE ~ nuts2                  # Leave other values unchanged
  ))

6. Regional-Level Control Variables

6.1 Regional Unemployment Rates (20-64 years)

The data is based on the NUTS 21 classification and can be obtained via the following link: https://ec.europa.eu/eurostat/databrowser/view/lfst_r_lfu3rt__custom_12066058/default/table?lang=en

Load and Clean Regional Unemployment Data:

  • Load the data and remove unnecessary columns and time periods.

  • Rename columns for consistency.

  • Adjust NUTS codes for Norway, Slovenia, and Croatia.

Handle Missing Values:

  • Address missing data for various regions and years by imputing values based on available information or calculations.

  • Adjust specific regions (e.g., Hungary, Ireland) based on additional datasets and expert knowledge.

Merge and Final Adjustments:

  • Merge the cleaned unemployment data with ess1.

  • Handle any remaining missing values with specific imputed values.

Cleanup:

  • Remove intermediate variables and datasets that are no longer needed.
# Load the dataset
unemployment_regional <-read_csv("Datasets/Regional_Unemployment_Eurostat.csv")

# Check for NUTS recodings
# OK: AT, BE, BG, CH, CY, CZ, DE, IS, IT, LU, LV, NL, PT, RO, SE, SK, ES
# Despite NUTS change, OK: FR, DK, EE, FI, GR, 
# Check: Hungary (HU10 -> HU12, HU11), HR, LT, NO (change from NO0A to NO05), 
# PL (check for PL92), SI, IE (with IE01 and IE02) 

time_periods_to_remove <- c("2003", "2005", "2007", "2009", "2011", "2013", "2015", 
                            "2017", "2019", "2021", "2022")

columns_to_remove <- c("DATAFLOW", "freq", "isced11", "sex", "age", 
                       "unit", "OBS_FLAG", "`LAST UPDATE`")


unemployment_regional <- unemployment_regional %>%
  filter(!TIME_PERIOD %in% time_periods_to_remove)

unemployment_regional <- unemployment_regional[, 
  !(names(unemployment_regional) %in% columns_to_remove)
]

unemployment_regional <- unemployment_regional[, -1]

# Rename
unemployment_regional <- unemployment_regional %>% 
  rename(ess_year = TIME_PERIOD)

# Rename
unemployment_regional <- unemployment_regional %>% 
  rename(nuts2 = geo)

# Rename 
unemployment_regional <- unemployment_regional %>% 
  rename(regional_unemployment = OBS_VALUE)

# Now I need to make some changes for Norway due to nuts recoding in 2021 
unemployment_regional <- unemployment_regional %>%
  mutate(nuts2 = ifelse(nuts2 == "NO0A", "NO05", nuts2))

# I do have some missing values that I have to adjust
# The following dataset will be used: https://www.gu.se/en/quality-government/qog-data/data-downloads/eu-regional-dataset
regional_unemployment_check <- read_excel("Datasets/Regional_Unemployment.xlsx")

regional_unemployment_check <- dplyr::select(regional_unemployment_check, year, 
                                             nuts2, eu_unemp_2064t_nuts2)

regional_unemployment_check$eu_unemp_2064t_nuts2 <- as.numeric(regional_unemployment_check$eu_unemp_2064t_nuts2)

regional_unemployment_check <- regional_unemployment_check %>% 
  rename(ess_year = year)

regional_unemployment_check <- regional_unemployment_check %>% 
  rename(regional_unemployment = eu_unemp_2064t_nuts2)

regional_unemployment1_check <- read_excel("regional_unemployment_Nuts1.xlsx")

regional_unemployment1_check <- dplyr::select(regional_unemployment1_check, year, nuts1,
                                              eu_unemp_2064t_nuts1)
# Rename 
regional_unemployment1_check <- regional_unemployment1_check %>% 
  rename(ess_year = year)

# Rename 
regional_unemployment1_check <- regional_unemployment1_check %>% 
  rename(nuts2 = nuts1)

# Rename 
regional_unemployment1_check <- regional_unemployment1_check %>% 
  rename(regional_unemployment = eu_unemp_2064t_nuts1)

# combine both datasets 
regional_unemployment_check <- rbind(regional_unemployment_check, 
                                     regional_unemployment1_check)

# make it numeric 
regional_unemployment_check$regional_unemployment <- as.numeric(regional_unemployment_check$regional_unemployment)


# Now make some adjustments for Slovenia 
unemployment_regional <- unemployment_regional %>% mutate(nuts2 = ifelse(nuts2 == "SI02", 
                                                                         "SI04", 
                                                                         nuts2))

unemployment_regional <- unemployment_regional %>% mutate(nuts2 = ifelse(nuts2 == "SI01", 
                                                                         "SI03", 
                                                                         nuts2))

# Now make some adjustments for Croatia 
unemployment_regional <- unemployment_regional %>% mutate(nuts2 = ifelse(nuts2 == "HR04", 
                                                                         "HR02", 
                                                                         nuts2))

# Merge both datasets
ess2 <- merge(ess1, unemployment_regional, by = c("nuts2", "ess_year"), 
              all.x = TRUE)

# For information on Norway, consider example 1 (the replication dataset) 
ess2 <- ess2 %>%
  mutate(regional_unemployment = ifelse(ess_year == 2008 & nuts2 == "NO01", 3.166667, 
                                        regional_unemployment))
ess2 <- ess2 %>%
  mutate(regional_unemployment = ifelse(ess_year == 2016 & nuts2 == "NO01", 3, 
                                        regional_unemployment))

ess2 <- ess2 %>%
  mutate(regional_unemployment = ifelse(ess_year == 2008 & nuts2 == "NO03", 3.3, 
                                        regional_unemployment))

ess2 <- ess2 %>%
  mutate(regional_unemployment = ifelse(ess_year == 2016 & nuts2 == "NO03", 3.3, 
                                        regional_unemployment))

ess2 <- ess2 %>%
  mutate(regional_unemployment = ifelse(ess_year == 2008 & nuts2 == "NO04", 3.583333, 
                                        regional_unemployment))

ess2 <- ess2 %>%
  mutate(regional_unemployment = ifelse(ess_year == 2016 & nuts2 == "NO04", 1.9, 
                                        regional_unemployment))

# select the columns 
regional_unemployment_ireland <- dplyr::select(example1, country, nuts2, 
                                               year_ess, unemp2064)

regional_unemployment_ireland <- regional_unemployment_ireland %>%
  filter(country == "IE")

regional_unemployment_ireland <- regional_unemployment_ireland %>%
  mutate(nuts_adjusted = case_when(
    nuts2 == "IE011" ~ "IE04", # Border
    nuts2 == "IE012" ~ "IE06", # Midland
    nuts2 == "IE013" ~ "IE04", # West
    nuts2 == "IE021" ~ "IE06", # Dublin
    nuts2 == "IE022" ~ "IE06", # Mid_East
    nuts2 == "IE023" ~ "IE05", # Mid_West
    nuts2 == "IE024" ~ "IE05", # South East
    nuts2 == "IE025" ~ "IE05", # South West
    TRUE ~ nuts2 # Default case, if no match found keep the original value
  ))

# Remove the first two columns
regional_unemployment_ireland <- regional_unemployment_ireland[, -c(1, 2)]

summary <- regional_unemployment_ireland %>%
  group_by(year_ess, nuts_adjusted) %>%
  summarize(
    mean_unemp = mean(unemp2064, na.rm = TRUE), #
    sum_unemp = sum(unemp2064, na.rm = TRUE),   
    count = n(),                                                     
  )

# For PL92 Mazowieckie, I will take the value from the NUTS1 level (PL7) -> Makroregion Centralny 
ess2 <- ess2 %>%
  mutate(regional_unemployment = ifelse(ess_year == 2008 & nuts2 == "PL92", 7.3,
                                        regional_unemployment))

# There is only missing data for FI20
remove(unemployment_regional, 
       regional_unemployment_check, 
       regional_unemployment1_check, 
       regional_unemployment_ireland, summary)

6.2 Regional Population Density

The dataset can be obtained from the following website: https://www.gu.se/en/quality-government/qog-data/data-downloads/eu-regional-dataset

  • Loaded regional population density data from Excel files (regional_population_density.xlsx and regional_population_density_Nuts1.xlsx).

  • Transformed columns:

    • In regional_population_density.xlsx: Renamed year to ess_year and eu_per_km2_nuts2 to regional_population_density.

    • In regional_population_density_Nuts1.xlsx: Renamed year to ess_year, nuts1 to nuts2, and eu_per_km2_nuts1 to regional_population_density.

  • Combined datasets using rbind to consolidate data.

Merge with Panel Data:

  • Merged the cleaned regional population density data with ess2 based on nuts2 and ess_year.
# Load the dataset
regional_population_density <- read_excel("Datasets/Regional_Population_Density.xlsx")

regional_population_density <- dplyr::select(regional_population_density, year, 
                                             nuts2, eu_per_km2_nuts2)

regional_population_density$eu_per_km2_nuts2 <- as.numeric(regional_population_density$eu_per_km2_nuts2)

regional_population_density <- regional_population_density %>% 
  rename(ess_year = year)

regional_population_density <- regional_population_density %>% 
  rename(regional_population_density = eu_per_km2_nuts2)

regional_population_density1 <- read_excel("Datasets/Regional_Population_Density_Nuts1.xlsx")

regional_population_density1 <- dplyr::select(regional_population_density1, year, 
                                              nuts1, eu_per_km2_nuts1)

# Rename 
regional_population_density1 <- regional_population_density1 %>% 
  rename(ess_year = year)

# Rename 
regional_population_density1 <- regional_population_density1 %>% 
  rename(nuts2 = nuts1)

# Rename 
regional_population_density1 <- regional_population_density1 %>% 
  rename(regional_population_density = eu_per_km2_nuts1)

# Combine both datasets
regional_population_density <- rbind(regional_population_density1, 
                                     regional_population_density)

# Make it numeric 
regional_population_density$regional_population_density <- as.numeric(regional_population_density$regional_population_density)

ess2 <- merge(ess2, regional_population_density, 
              by = c("nuts2", "ess_year"), all.x = TRUE)

6.3 Regional Old-Age Dependency Ratio

https://ec.europa.eu/eurostat/databrowser/view/demo_r_pjanind2/default/table?lang=en

Read and Transform Data:

  • Loaded old age dependency data from CSV file old_age_dependency.csv.

  • Removed unnecessary columns:

    • Excluded columns such as DATAFLOW, freq, indic_de, unit, and OBS_FLAG.
  • Filtered data to remove specified time periods: 2003, 2005, 2007, 2009, 2011, 2013, 2015, 2017, 2019, 2021, 2022.

  • Renamed columns for clarity:

    • TIME_PERIOD to ess_year

    • geo to nuts2

    • OBS_VALUE to regional_old_age_dependency

Merge with Panel Data:

  • Merged the cleaned old age dependency data with ess3 based on nuts2 and ess_year.

Cleanup:

  • Removed the intermediate dataset old_age.
# Load the dataset
old_age <- read_csv("Datasets/Regional_Old_Age_Dependency.csv")

time_periods_to_remove <- c("2003", "2005", "2007", "2009", "2011", "2013", 
                            "2015", "2017", "2019", "2021", "2022")

columns_to_remove <- c("DATAFLOW", "freq", "indic_de", "unit", 
                       "OBS_FLAG")

old_age <- old_age[!(old_age$TIME_PERIOD %in% time_periods_to_remove), ]
old_age <- old_age[, !(names(old_age) %in% columns_to_remove)]
old_age <- old_age[, -1]

# Rename 
old_age <- old_age %>%
  rename(ess_year = TIME_PERIOD)

# Rename 
old_age <- old_age %>%
  rename(nuts2 = geo)

# Rename 
old_age <- old_age %>%
  rename(regional_old_age_dependency = OBS_VALUE)

# Merge both datasets 
ess2 <- merge(ess2, old_age, 
              by = c("nuts2", "ess_year"), all.x = TRUE)

# remove the dataset from the environment 
remove(old_age)

6.4 Regional Net Migration

The dataset can be obtained from the following website: https://urban.jrc.ec.europa.eu/ardeco/viewer/SNMTN?jdvfys=asc&jdvfc=all&jdvfnl=1%2C2%2C3

  • Data Import: Reads the regional_net_migration.xlsx file into a dataframe.

  • Data Transformation:

    • Uses pivot_longer() to reshape data from wide to long format, creating columns for ess_year and regional_net_migration.

    • Converts regional_net_migration values to numeric.

  • Column Renaming: Changes the column name NUTS to nuts2 for consistency.

  • Data Integration: Merges the reshaped migration data with the ess2 dataset on nuts2 and ess_year, preserving all rows from ess2.

# Read the dataset
regional_net_migration <- read_excel("Datasets/Regional_Net_Migration.xlsx")

# Make it from wide to long format 
regional_net_migration <- regional_net_migration %>%
  pivot_longer(cols = `1990`:`2022`,  
               names_to = "ess_year",     
               values_to = "regional_net_migration")  

regional_net_migration$regional_net_migration <- as.numeric(regional_net_migration$regional_net_migration)

# Rename 
regional_net_migration <- regional_net_migration %>% 
  rename(nuts2 = NUTS)

# Merge both datasets
ess2 <- merge(ess2, regional_net_migration, by = c("nuts2", "ess_year"), all.x = TRUE)

6.5 Regional GDP at current market prices by NUTS 2 regions (euro per inhabitant in percentage of the EU27)

https://ec.europa.eu/eurostat/databrowser/view/nama_10r_2gdp__custom_12226750/default/table?lang=en

Read and Transform Data:

  • Loaded GDP data from CSV file regional_gdp_euroinhabitantpercentage.csv.

  • Removed unnecessary columns (e.g., DATAFLOW, freq, sex, age, unit, OBS_FLAG, LAST UPDATE).

  • Filtered data to retain relevant years.

  • Renamed columns for clarity: TIME_PERIOD to ess_year, geo to nuts2, and OBS_VALUE to regional_gdp_eurinhabitant_percentageEU.

Merge with ess2:

  • Merged the cleaned GDP data with ess2 based on nuts2 and ess_year.

Cleanup:

  • Removed intermediate datasets (e.g., gdp_euroinhabitantpercentage_regional).
# Load the dataset
gdp_euroinhabitantpercentage_regional <- read_csv("Datasets/Regiona_GDP Per Inhabitant in Percentage to EU average.csv")

columns_to_remove <- c("DATAFLOW", "freq", "sex", "age", 
                       "unit", "OBS_FLAG", "`LAST UPDATE`")

gdp_euroinhabitantpercentage_regional <- 
  gdp_euroinhabitantpercentage_regional %>%
  filter(!TIME_PERIOD %in% time_periods_to_remove)

gdp_euroinhabitantpercentage_regional <- 
  gdp_euroinhabitantpercentage_regional %>%
  select(-one_of(columns_to_remove))

gdp_euroinhabitantpercentage_regional <- gdp_euroinhabitantpercentage_regional[, -1]

# rename 
gdp_euroinhabitantpercentage_regional <- gdp_euroinhabitantpercentage_regional %>% 
  rename(ess_year = TIME_PERIOD)

# rename 
gdp_euroinhabitantpercentage_regional <- gdp_euroinhabitantpercentage_regional %>% 
  rename(nuts2 = geo)

# rename 
gdp_euroinhabitantpercentage_regional <- gdp_euroinhabitantpercentage_regional %>% 
  rename(regional_gdp_eurinhabitant_percentageEU = OBS_VALUE)

# merge both datasets 
ess2 <- merge(ess2, gdp_euroinhabitantpercentage_regional, by = c("nuts2", "ess_year"), all.x = TRUE)

6.6 Census Data on Share of Migrants at the Regional Level

Loading and Filtering Data:

  • The census2001 and census2011 datasets are loaded, each representing census data from the years 2001 and 2011, respectively.

  • Initial filtering removes irrelevant rows and columns, including values based on certain geographical identifiers and citizen statuses.

Geographical Hierarchy Adjustments:

  • he NUTS2 codes, representing geographical regions within countries, are created by modifying the geo column to establish regional (NUTS2) identifiers. These are adjusted to accommodate changes in regional classifications over time, aligning with the 2016 NUTS classifications.

  • This includes modifying regions in France, Greece, Ireland, and other European nations to maintain consistency across the datasets.

Citizenship Categorisation:

  • Based on the citizen variable, a new citizen_adjusted column is created to categorise populations by origin, differentiating between native, EU-born, non-EU-born, and other regional or foreign-born groups.

Summarizing and Reshaping:

  • Each dataset undergoes aggregation to create summarised metrics such as total population, native population, and foreign-born population shares by regional levels.

Yearly Expansion:

  • The census2001 and census2011 datasets are expanded by creating additional rows for interpolated years. This allows the dataset to include estimates for years between census periods.

Finalizing the Combined Dataset:

All datasets are combined (census_overall), duplicates are removed, and the dataset is finalised with adjusted NUTS2 codes and population metrics.

# Load the dataset 
census2001 <- read_csv("census2001.csv")

census2001 <- census2001 %>% 
  filter(nchar(geo) == 5)

# I want to create a column where I have nuts2 level
census2001$nuts2 <- substr(census2001$geo, 1, nchar(census2001$geo) - 1)
remove_last_digit <- function(x) {
    ifelse(substr(x, 1, 2) == "UK",
           substr(x, 1, nchar(x) - 2),
           substr(x, 1, nchar(x) - 1))
}

census2001$nuts2 <- sapply(census2001$geo, remove_last_digit)

# remove some columns
columns_to_remove <- c("DATAFLOW", "freq", "sex", "age", "unit", 
                       "OBS_FLAG", "`LAST UPDATE`")

census2001 <- census2001[, !(names(census2001) %in% columns_to_remove)]

census2001 <- census2001[, -1]

# I need to remove some rows based on the value in the column citizen: AFR_N, AFR_OTH, 
# AME_N, ASI_NME, ASI_OTH, EX_SU_EUR, EX_SU_ASI, OCE, EUR_OTH, EUR_C_E

unwanted_values <- c("AFR_N", "AFR_OTH", "AME_N", "ASI_NME", "ASI_OTH", 
                     "EX_SU_EUR", "EX_SU_ASI", 
                     "OCE", "EUR_OTH", "EUR_C_E", 
                     "AME_OTH", "OTH")

census2001 <- census2001 %>%
    filter(!citizen %in% unwanted_values)

# create a new column 
census2001 <- census2001 %>%
  mutate(citizen_adjusted = case_when(citizen %in% c("EL", "ES", "IT", "PT") ~
                                        "regional_foreign_born_EU15_Southern",
        citizen %in% c("AT", "BE", "DE", "DK", 
                       "FI", "FR", "IE", 
                       "LU", "NL", "SE", "UK") ~ "regional_foreign_born_EU15_Northern",
        citizen %in% c("AT", "BE", "DE", "DK", "EL", 
                       "ES", "FI", "FR", "IE", "IT", 
                       "LU", "NL", 
                       "PT", "SE", "UK") ~ "regional_foreign_born_EU15",
        citizen %in% c("AFR") ~ "regional_foreign_born_Africa",
        citizen %in% c("AME") ~ "regional_foreign_born_America",
        citizen %in% c("ASI") ~ "regional_foreign_born_Asia",
        citizen %in% c("NAT") ~ "regional_native_population",
        citizen %in% c("EFTA") ~ "EFTA",
        citizen %in% c("EU_FOR") ~ "regional_foreign_born_EU",
        citizen %in% c("EUR") ~ "regional_foreign_born_Europe",
        citizen %in% c("TOTAL") ~ "regional_total_population",
        TRUE ~ citizen
    ))


census2001 <- census2001 %>%
    group_by(nuts2, citizen_adjusted) %>%
    summarize(total_OBS_VALUE = sum(OBS_VALUE, na.rm = TRUE))

census2001$ess_year <- 2001

example11 <- dplyr::select(example1, nuts2, year_ess, year, pct_2000, pct_2010,
                           pct_EU15A10, ln_pop_nat_value)

example11$pop_nat_value <- exp(example11$ln_pop_nat_value)

example11$regional_foreign_born_population_share_2001 <- example11$pct_2000

example11$regional_foreign_born_population_share_2011 <- example11$pct_2010

example11$regional_foreign_born_population_EU_share <- example11$pct_EU15A10

example11 <- example11[, -c(4:7)]

example11 <- example11 %>%
    rename(regional_native_population = pop_nat_value)

example11_2000 <- subset(example11, year == 2000)

example11_2000 <- example11_2000 %>%
    rename(regional_foreign_born_population_share = regional_foreign_born_population_share_2001)

example11_2000 <- example11_2000[, -c(6)]

example11_2010 <- subset(example11, year == 2010)

example11_2010 <- example11_2010 %>%
  rename(regional_foreign_born_population_share = regional_foreign_born_population_share_2011)

example11_2010 <- example11_2010[, -c(5)]

# combine both datasets 
example11 <- rbind(example11_2010, example11_2000)

example11$regional_foreign_born_population_nonEU_share <- example11$regional_foreign_born_population_share - example11$regional_foreign_born_population_EU_share

example11$regional_total_population <- example11$regional_native_population/(1-example11$regional_foreign_born_population_share)

example11$regional_foreign_born_population <- example11$regional_total_population*example11$regional_foreign_born_population_share

example11$regional_foreign_born_populationEU <- example11$regional_total_population*example11$regional_foreign_born_population_EU_share

example11 <- example11[, -c(3)]

example11$regional_foreign_born_population_nonEU <- example11$regional_total_population*example11$regional_foreign_born_population_nonEU_share

example11 <- example11 %>%
  rename(ess_year = year_ess)

example11 <- example11 %>%
  rename(regional_foreign_born_EU = regional_foreign_born_populationEU)

example11 <- example11 %>%
  rename(regional_foreign_born_nonEU_share = regional_foreign_born_population_nonEU_share)

example11 <- example11 %>%
  rename(regional_foreign_born_nonEU = regional_foreign_born_population_nonEU)

example11 <- example11 %>%
  rename(regional_foreign_born_EU_share = regional_foreign_born_population_EU_share)

census2001 <- census2001 %>%
  pivot_wider(
    names_from = citizen_adjusted,
    values_from = total_OBS_VALUE,
    values_fill = list(number_residents = 0) # Fill NAs with 0 or any other value as needed
)

census2001$regional_foreign_born_population <- census2001$regional_total_population-census2001$regional_native_population

new_years <- 2002:2009
expanded_data <- bind_rows(
lapply(new_years, function(year) {
census2001 %>%
mutate(ess_year = year)
})
)

census2001 <- rbind(expanded_data, census2001)
census2001$regional_foreign_born_Europe <- census2001$regional_foreign_born_Europe - census2001$regional_native_population

census2001$regional_foreign_born_EU15 <- census2001$regional_foreign_born_EU15_Northern + census2001$regional_foreign_born_EU15_Southern

census2001$regional_foreign_born_nonEU <- census2001$regional_foreign_born_population- census2001$regional_foreign_born_EU

census2001$regional_foreign_born_EU_share <- census2001$regional_foreign_born_EU/census2001$regional_total_population

census2001$regional_foreign_born_population_share <- census2001$regional_foreign_born_population/census2001$regional_total_population

census2001$regional_foreign_born_nonEU_share <- census2001$regional_foreign_born_nonEU/census2001$regional_total_population

census_adjusted2001 <- dplyr::select(census2001, ess_year, nuts2, 
                                     regional_total_population,
                                     regional_native_population, 
                                     regional_foreign_born_population,
                                     regional_foreign_born_EU, 
                                     regional_foreign_born_nonEU,
                                     regional_foreign_born_population_share, 
                                     regional_foreign_born_EU_share, 
                                     regional_foreign_born_nonEU_share)

####### consider the census 2011 dataset 
# Load the dataset 
census2011 <- read_csv("census2011.csv")

# select the columns 
census2011 <- dplyr::select(census2011, citizen, geo, TIME_PERIOD, OBS_VALUE)

census2011 <- census2011 %>%
mutate(citizen_adjusted = case_when(
  citizen %in% c("EU_FOR") ~ "regional_foreign_born_EU",
  citizen %in% c("FOR") ~ "regional_foreign_born_population",
  citizen %in% c("NEU") ~ "regional_foreign_born_nonEU",
  citizen %in% c("NAT") ~ "National",
  citizen %in% c("TOTAL") ~ "Total",
  TRUE ~ citizen
  ))

census2011 <- census2011 %>%
  filter(!citizen %in% c("UNK", "STLS", "EX_SU_EUR", "OTH"))

census2011 <- census2011[, -1]

census2011 <- census2011 %>%
  rename(ess_year = TIME_PERIOD)

census2011 <- census2011 %>%
  rename(nuts2 = geo)

census2011 <- census2011 %>%
  pivot_wider(
    names_from = citizen_adjusted,
    values_from = OBS_VALUE,
    values_fill = list(OBS_VALUE = 0) # Fill NAs with 0 or any other value as needed
)

census2011$regional_foreign_born_population <- census2011$Total-census2011$National
new_years <- 2010:2019
expanded_data <- bind_rows(
lapply(new_years, function(year) {
census2011 %>%
mutate(ess_year = year)
})
)

census2011 <- rbind(expanded_data, census2011)

census2011 <- census2011 %>%
  rename(regional_native_population = National)

census2011 <- census2011 %>%
  rename(regional_total_population = Total)

census2011$regional_foreign_born_EU_share <- census2011$regional_foreign_born_EU/census2011$regional_total_population

census2011$regional_foreign_born_population_share <- census2011$regional_foreign_born_population/census2011$regional_total_population

census2011$regional_foreign_born_nonEU_share <- census2011$regional_foreign_born_nonEU/census2011$regional_total_population

census_adjusted2011 <- dplyr::select(census2011, ess_year, nuts2, 
                                     regional_total_population,
                                     regional_native_population, 
                                     regional_foreign_born_population,
                                     regional_foreign_born_EU, 
                                     regional_foreign_born_nonEU,
                                     regional_foreign_born_population_share, 
                                     regional_foreign_born_EU_share, 
                                     regional_foreign_born_nonEU_share)

census_overall <- rbind(census_adjusted2001, census_adjusted2011, example11)

census_overall <- unique(census_overall)

census_overall <- census_overall[!duplicated(census_overall[, c("ess_year", "nuts2")]), ]

# Make the adjustments and use the 2016 NUTS classifications
census_overall$nuts2 <- gsub("LU00", "LU0", census_overall$nuts2)
census_overall$nuts2 <- gsub("CY00", "CY0", census_overall$nuts2)
census_overall$nuts2 <- gsub("IS00", "IS0", census_overall$nuts2)
census_overall$nuts2 <- gsub("SI02", "SI04", census_overall$nuts2)
census_overall$nuts2 <- gsub("SI01", "SI03", census_overall$nuts2)
census_overall$nuts2 <- gsub("FR10", "FR10", census_overall$nuts2) # Île de France
census_overall$nuts2 <- gsub("FR24", "FRB0", census_overall$nuts2) 
census_overall$nuts2 <- gsub("FR26", "FRC1", census_overall$nuts2) # Bourgogne
census_overall$nuts2 <- gsub("FR43", "FRC2", census_overall$nuts2) # Franche-Comté
census_overall$nuts2 <- gsub("FR25", "FRD1", census_overall$nuts2) # Basse-Normandie
census_overall$nuts2 <- gsub("FR23", "FRD2", census_overall$nuts2) # Haute-Normandie
census_overall$nuts2 <- gsub("FR30", "FRE1", census_overall$nuts2) # Nord - Pas-de-Calais
census_overall$nuts2 <- gsub("FR22", "FRE2", census_overall$nuts2) # Picardie
census_overall$nuts2 <- gsub("FR42", "FRF1", census_overall$nuts2) # Alsace
census_overall$nuts2 <- gsub("FR21", "FRF2", census_overall$nuts2)
census_overall$nuts2 <- gsub("FR41", "FRF3", census_overall$nuts2) # Lorraine
census_overall$nuts2 <- gsub("FR51", "FRG0", census_overall$nuts2) # Pays de la Loire
census_overall$nuts2 <- gsub("FR52", "FRH0", census_overall$nuts2) # Bretagne
census_overall$nuts2 <- gsub("FR61", "FRI1", census_overall$nuts2) # Aquitaine
census_overall$nuts2 <- gsub("FR63", "FRI2", census_overall$nuts2) # Limousin
census_overall$nuts2 <- gsub("FR53", "FRI3", census_overall$nuts2) # Poitou-Charentes
census_overall$nuts2 <- gsub("FR81", "FRJ1", census_overall$nuts2) # Languedoc-Roussillon
census_overall$nuts2 <- gsub("FR62", "FRJ2", census_overall$nuts2) # Midi-Pyrénées
census_overall$nuts2 <- gsub("FR72", "FRK1", census_overall$nuts2) # Auvergne
census_overall$nuts2 <- gsub("FR71", "FRK2", census_overall$nuts2) # Rhône-Alpes
census_overall$nuts2 <- gsub("FR82", "FRL0", census_overall$nuts2) # Provence-Alpes-Côte d'Azur
census_overall$nuts2 <- gsub("FR83", "FRM0", census_overall$nuts2) # Corse
census_overall$nuts2 <- gsub("FRA1", "FRY1", census_overall$nuts2) # Guadeloupe
census_overall$nuts2 <- gsub("FRA2", "FRY2", census_overall$nuts2) # Martinique
census_overall$nuts2 <- gsub("FRA3", "FRY3", census_overall$nuts2) # Guyane
census_overall$nuts2 <- gsub("FRA4", "FRY4", census_overall$nuts2) # La Réunion
census_overall$nuts2 <- gsub("FRA5", "FRY5", census_overall$nuts2) # Mayotte
census_overall$nuts2 <- gsub("PL11", "PL71", census_overall$nuts2) # Łódzkie
census_overall$nuts2 <- gsub("PL12", "PL92", census_overall$nuts2) # Mazowieckie
census_overall$nuts2 <- gsub("PL31", "PL81", census_overall$nuts2) # Lubelskie
census_overall$nuts2 <- gsub("PL32", "PL82", census_overall$nuts2) # Podkarpackie
census_overall$nuts2 <- gsub("PL33", "PL72", census_overall$nuts2) # Swietokrzyskie
census_overall$nuts2 <- gsub("PL34", "PL84", census_overall$nuts2) # Podlaskie
census_overall$nuts2 <- gsub("GR11", "EL51", census_overall$nuts2) # Anatoliki Makedonia & Thraki
census_overall$nuts2 <- gsub("GR12", "EL52", census_overall$nuts2) # Kentriki Makedonia
census_overall$nuts2 <- gsub("GR13", "EL53", census_overall$nuts2) #
census_overall$nuts2 <- gsub("GR14", "EL61", census_overall$nuts2) #
census_overall$nuts2 <- gsub("GR41", "EL41", census_overall$nuts2) #
census_overall$nuts2 <- gsub("GR21", "EL54", census_overall$nuts2) #
census_overall$nuts2 <- gsub("GR22", "EL62", census_overall$nuts2) #
census_overall$nuts2 <- gsub("GR23", "EL63", census_overall$nuts2) #
census_overall$nuts2 <- gsub("GR24", "EL64", census_overall$nuts2) #
census_overall$nuts2 <- gsub("GR25", "EL65", census_overall$nuts2) #
census_overall$nuts2 <- gsub("GR30", "EL30", census_overall$nuts2) #
census_overall$nuts2 <- gsub("GR42", "EL42", census_overall$nuts2) #
census_overall$nuts2 <- gsub("GR43", "EL43", census_overall$nuts2) #
census_overall$nuts2 <- gsub("BG11", "BG31", census_overall$nuts2) #
census_overall$nuts2 <- gsub("BG12", "BG32", census_overall$nuts2) #
census_overall$nuts2 <- gsub("BG13", "BG33", census_overall$nuts2) #
census_overall$nuts2 <- gsub("BG23", "BG34", census_overall$nuts2) #
census_overall$nuts2 <- gsub("BG21", "BG41", census_overall$nuts2) #
census_overall$nuts2 <- gsub("BG22", "BG42", census_overall$nuts2) #
census_overall$nuts2 <- gsub("RO06", "RO11", census_overall$nuts2) #
census_overall$nuts2 <- gsub("RO07", "RO12", census_overall$nuts2) #
census_overall$nuts2 <- gsub("RO01", "RO21", census_overall$nuts2) #
census_overall$nuts2 <- gsub("RO02", "RO22", census_overall$nuts2) #
census_overall$nuts2 <- gsub("RO03", "RO31", census_overall$nuts2) #
census_overall$nuts2 <- gsub("RO08", "RO32", census_overall$nuts2) #
census_overall$nuts2 <- gsub("RO04", "RO41", census_overall$nuts2) #
census_overall$nuts2 <- gsub("RO05", "RO42", census_overall$nuts2) #

values_to_include <- c("IE0011", "IE012", "IE013", "IE021", "IE022", 
                       "IE023", "IE024", "IE025")

census_overall_ireland <- census_overall[census_overall$nuts2 %in% values_to_include, ]

census_overall_ireland <- census_overall_ireland %>%
  mutate(nuts2 = case_when(
    nuts2 == "IE011" ~ "IE04", # Border
    nuts2 == "IE012" ~ "IE06", # Midland
    nuts2 == "IE013" ~ "IE04", # West
    nuts2 == "IE021" ~ "IE06", # Dublin
    nuts2 == "IE022" ~ "IE06", # Mid_East
    nuts2 == "IE023" ~ "IE05", # Mid_West
    nuts2 == "IE024" ~ "IE05", # South East
    nuts2 == "IE025" ~ "IE05", # South West
    TRUE ~ nuts2 # Default case, if no match found keep the original value
))

census_overall_ireland <- census_overall_ireland %>%
  group_by(ess_year, nuts2) %>%
  summarize(across(starts_with("regional_"), sum, na.rm = TRUE), .groups = "drop")

row_to_copy <- census_overall_ireland[census_overall_ireland$ess_year == 2016 &
                                        census_overall_ireland$nuts2 == "IE04", ]

new_row <- row_to_copy

new_row$ess_year <- 2018

new_row$nuts2 <- "IE04"

# Add the new row to the dataframe
census_overall_ireland <- rbind(census_overall_ireland, new_row)

row_to_copy <- census_overall_ireland[census_overall_ireland$ess_year == 2016 & census_overall_ireland$nuts2 == "IE05", ]

new_row <- row_to_copy

new_row$ess_year <- 2018

new_row$nuts2 <- "IE05"
# Add the new row to the dataframe
census_overall_ireland <- rbind(census_overall_ireland, new_row)

row_to_copy <- census_overall_ireland[census_overall_ireland$ess_year == 2016 & census_overall_ireland$nuts2 == "IE06", ]

new_row <- row_to_copy

new_row$ess_year <- 2018

new_row$nuts2 <- "IE06"

# Add the new row to the dataframe
census_overall_ireland <- rbind(census_overall_ireland, new_row)

census_overall <- rbind(census_overall_ireland, census_overall)

values_to_include <- c("LT00")

census_overall_lithuania <- census_overall[census_overall$nuts2 %in% values_to_include, ]

census_overall_lithuania_lt01 <- census_overall_lithuania %>%
  mutate(nuts2 = "LT01")

census_overall_lithuania_lt02 <- census_overall_lithuania %>%
  mutate(nuts2 = "LT02")

df_expanded <- bind_rows(census_overall_lithuania, 
                         census_overall_lithuania_lt01,
                         census_overall_lithuania_lt02)

census_overall <- rbind(df_expanded, census_overall)

values_to_include <- c("HU10")
census_overall_hungary <- census_overall[census_overall$nuts2 %in% values_to_include, ]

census_overall_hungary_hu11 <- census_overall_hungary %>%
  mutate(nuts2 = "HU11")

census_overall_hungary_hu12 <- census_overall_hungary %>%
  mutate(nuts2 = "HU12")

df_expanded <- bind_rows(census_overall_hungary, 
                         census_overall_hungary_hu12, 
                         census_overall_hungary_hu11)

census_overall <- rbind(df_expanded, census_overall)

ess3 <- merge(ess2, census_overall, 
              by = c("nuts2", "ess_year"), 
              all.x = TRUE)

subset_df <- subset(ess3, is.na(regional_foreign_born_EU_share), 
                    select = c(ess_year, nuts2))

subset_df <- subset_df[subset_df$ess_year != 2020, ]

# remove the datasets from the environment 
remove(census_adjusted2001, census_adjusted2011, census_overall, 
       census_overall_hungary,
       census_overall_hungary_hu11, census_overall_hungary_hu12, 
       census_overall_ireland,
       census_overall_lithuania, census_overall_lithuania_lt01, 
       census_overall_lithuania_lt02, census2001,
       census2011, example11, example11_2000, example11_2010)

7. Save the Final Dataset

Save the final ess3 dataset as a RData-File.

save(ess3, file = "ess3.RData")

8. Quantitative Data Analysis

8.1 Load the Dataset and Make Some Final Adjustments

Load the Data:

  • Loaded ess3.RData.

Removed Columns:

  • Excluded the indexwelfare column.

Created Welfare Indices:

  • Generated indices based on z-scores, including:

    • indexwelfare_alesina_sbeqsoc_sbprvpv

Added Variables:

  • regional_positive_net_migration_dummy

  • Log-transformed variables:

    • regional_foreign_born_population

    • EU_immigration_cumulative_4yr

    • EU_emigration_cumulative_4yr

Filtered Data:

  • Included only survey rounds 4 and 8.

  • Removed rows with missing values for key columns.

Classified and Subset Data:

  • Created binary indicators for:

    • immigration_affected

    • emigration_affected

Generated subsets:

  • ess3_emigration

  • ess3_immigration

# Load the Data 
load("ess3.RData")

# Create Binary Migration Indicator:
ess3$regional_positive_net_migration_dummy <- ifelse(ess3$regional_net_migration >= 1, 1, 0)

# Consutrct the Index for Welfare State Support - dependent variable
ess3$indexwelfare_alesina_sbeqsoc_sbprvpv <- rowMeans(ess3[, c("gvcldcr_z_score", 
                                                               "gvslvue_z_score",
                                                               "gvslvol_z_score",
                                                               "gincdif_rescaled_z_score", 
                                                               "sbeqsoc_rescaled_z_score",
                                                               "sbprvpv_rescaled_z_score")])

# Log-Transform Variables
ess3$regional_foreign_born_population_log <- log(ess3$regional_foreign_born_population + 1)

ess3$regional_foreign_born_nonEU_log <- log(ess3$regional_foreign_born_nonEU + 1)

ess3$national_foreign_population_NonEU_log <- log(ess3$national_foreign_population_NonEU)

# Filter the dataset for ESS 2008 and 2016 
ess3 <- ess3 %>%
  filter(essround %in% c(4, 8))

# Remove missing values 
columns_to_check <- c("gndr_dummyfemale", "lrscale", "agea", "unemployed_dummy",
                       "educ", "urban_dummy", "national_unemployment_level",
                       "national_social_protection", 
                      "regional_population_density",
                       "regional_old_age_dependency", 
                      "regional_foreign_born_population",
                       "regional_unemployment", "national_corruption_perception",
                      "indexwelfare_alesina_sbeqsoc_sbprvpv", "trstprl")

ess3 <- ess3 %>%
  filter(across(all_of(columns_to_check), ~ !is.na(.)))

# Transform and Log Migration Data
ess3$EU_net_migration_mean_4yr <-ess3$national_net_migration_4yr

ess3$log_EU_immigration_cumulative_4yr <- log(ess3$EU_immigration_cumulative_4yr + 1)  # Adding 1 to avoid log(0)

ess3$log_EU_emigration_cumulative_4yr <- log(ess3$EU_emigration_cumulative_4yr + 1)  # Adding 1 to avoid log(0)

# Create intra-EU Migration Impact Indicators
ess3$immigration_affected <- ifelse(ess3$EU_net_migration_mean_4yr > 0, 1, 0)

ess3$emigration_affected <- ifelse(ess3$EU_net_migration_mean_4yr < 0, 1, 0)

# Subset Based on Migration Impact
ess3 <- subset(ess3, immigration_affected == 1 | emigration_affected == 1)

# subset the dataset
ess3_emigration <- subset(ess3, emigration_affected == 1)
ess3_immigration <- subset(ess3, immigration_affected == 1)

8.2 Hierarchical Linear Regression Models

The Hierarchical Linear Regression Models are performed and presented.

Hypothesis 1 and model 1

This analysis uses a hierarchical linear model (model1) to explore the impact of cumulative EU immigration on welfare state support. The model controls for gender, age, political placement, education, employment status, urban residence, trust in national parliament, and several regional and national socioeconomic factors. Random effects are included for both country and regional levels. Results are presented with coefficients, standard errors, and significance levels.

# Model 1 for Hypothesis 1 
model1 <- lmer(indexwelfare_alesina_sbeqsoc_sbprvpv ~ log_EU_immigration_cumulative_4yr + 
                 gndr_dummyfemale + agea + lrscale + educ + unemployed_dummy + urban_dummy +
                 trstprl +  regional_unemployment + regional_population_density +
                 regional_old_age_dependency + regional_foreign_born_population_log +
                 national_unemployment_level + national_social_protection +
                 national_corruption_perception + (1 | cntry) + (1 | cntry:nuts2),
                data = ess3_immigration, weights = pspwght)

# display the results 
texreg::knitreg(list(model1), 
                custom.model.names = c("Model 1: Welfare State Support"), 
                caption = "Hierarchical Linear Model Regression Results for Hypothesis 1", 
                digits = 3, custom.coef.names = c("(Intercept)", 
                                                  "Log of Cumulative EU Immigration (t-4)",
                                                  "Gender: Female", "Age", 
                                                  "Left-Right Political Placement", 
                                                  "Education level", 
                                                  "Employment Status: Unemployed",
                                                  "Place of residence: Urban", 
                                                  "Trust in National Parliament", 
                                                  "Regional Unemployment Rate (20-64 years)", 
                                                  "Regional Population Density", 
                                                  "Regional Old Age Dependency", 
                                                  "Regional Foreign-born Population (log)", 
                                                  "National Unemployment Rate (20-64 years)", 
                                                  "National Social Protection Expenditure (% of GDP)",
                                                  "National Corruption Perception Index"))
Hierarchical Linear Model Regression Results for Hypothesis 1
  Model 1: Welfare State Support
(Intercept) 1.119***
  (0.296)
Log of Cumulative EU Immigration (t-4) -0.069**
  (0.026)
Gender: Female 0.045***
  (0.005)
Age 0.001***
  (0.000)
Left-Right Political Placement -0.054***
  (0.001)
Education level -0.016***
  (0.002)
Employment Status: Unemployed 0.106**
  (0.034)
Place of residence: Urban 0.057***
  (0.008)
Trust in National Parliament 0.013***
  (0.001)
Regional Unemployment Rate (20-64 years) 0.009**
  (0.003)
Regional Population Density -0.000
  (0.000)
Regional Old Age Dependency -0.003
  (0.001)
Regional Foreign-born Population (log) -0.027***
  (0.007)
National Unemployment Rate (20-64 years) -0.024***
  (0.005)
National Social Protection Expenditure (% of GDP) 0.012***
  (0.003)
National Corruption Perception Index 0.001
  (0.002)
AIC 65829.037
BIC 65992.630
Log Likelihood -32895.518
Num. obs. 40546
Num. groups: cntry:nuts2 160
Num. groups: cntry 17
Var: cntry:nuts2 (Intercept) 0.007
Var: cntry (Intercept) 0.038
Var: Residual 0.257
***p < 0.001; **p < 0.01; *p < 0.05
# ICC at the country level 
ICC_c1 <- (0.038)/(0.038+0.257 + 0.007)
ICC_c1 
## [1] 0.1258278
# ICC at the regional level 
ICC_r1 <- (0.007)/(0.038+0.257+ 0.007)
ICC_r1
## [1] 0.02317881
# ICC total 
ICC_t1 <- (0.038+0.007)/(0.038+0.007+0.257)
ICC_t1
## [1] 0.1490066

Hypothesis 2 and model 2

This analysis utilises a hierarchical linear model (model2) to investigate the effect of cumulative EU emigration on welfare state support. The model includes controls for gender, age, political placement, education, employment status, urban residence, trust in national parliament, and various regional and national socio-economic factors. Random effects are incorporated for both country and regional levels. Results are displayed with coefficients, standard errors, and significance levels.

# Model 2 for Hypothesis 2
model2 <- lmer(indexwelfare_alesina_sbeqsoc_sbprvpv ~ 
                 log_EU_emigration_cumulative_4yr + gndr_dummyfemale + 
                 agea + lrscale + educ + unemployed_dummy + urban_dummy + trstprl +
                 regional_unemployment + regional_population_density + 
                 regional_old_age_dependency + regional_foreign_born_population_log +
                 national_unemployment_level + national_social_protection +
                 national_corruption_perception + (1 | cntry) + (1 | cntry:nuts2),
                data = ess3_emigration, weights = pspwght)

# display the results
texreg::knitreg(model2, 
                custom.model.names = c("Model 2: Welfare State Support"), 
                caption = "Hierarchical Linear Model Regression Results for Hypothesis 2", 
                digits = 3, custom.coef.names = c("(Intercept)", 
                "Log of Cumulative EU Emigration (t-4)", "Gender: Female", 
                "Age", 
                "Left-Right Political Placement", "Education level", 
                "Employment Status: Unemployed",
                "Place of residence: Urban", "Trust in National Parliament", 
                "Regional Unemployment Rate (20-64 years)", 
                "Regional Population Density", 
                "Regional Old Age Dependency", "Regional Foreign-born Population (log)", 
                "National Unemployment Rate (20-64 years)", 
                "National Social Protection Expenditure (% of GDP)", 
                "National Corruption Perception Index"), fit.headers = TRUE)
Hierarchical Linear Model Regression Results for Hypothesis 2
  Model 2: Welfare State Support
(Intercept) -1.024
  (0.550)
Log of Cumulative EU Emigration (t-4) 0.113***
  (0.028)
Gender: Female 0.058***
  (0.008)
Age 0.002***
  (0.000)
Left-Right Political Placement -0.012***
  (0.002)
Education level -0.036***
  (0.003)
Employment Status: Unemployed 0.080
  (0.045)
Place of residence: Urban -0.031**
  (0.010)
Trust in National Parliament -0.002
  (0.002)
Regional Unemployment Rate (20-64 years) 0.006
  (0.004)
Regional Population Density 0.000
  (0.000)
Regional Old Age Dependency 0.002
  (0.004)
Regional Foreign-born Population (log) 0.014
  (0.015)
National Unemployment Rate (20-64 years) -0.063**
  (0.021)
National Social Protection Expenditure (% of GDP) -0.023
  (0.018)
National Corruption Perception Index 0.011*
  (0.005)
AIC 33766.863
BIC 33916.695
Log Likelihood -16864.432
Num. obs. 19651
Num. groups: cntry:nuts2 87
Num. groups: cntry 11
Var: cntry:nuts2 (Intercept) 0.014
Var: cntry (Intercept) 0.155
Var: Residual 0.291
***p < 0.001; **p < 0.01; *p < 0.05
# ICC at the country level 
ICC_c2 <- (0.155)/(0.155+0.291+0.014)
ICC_c2
## [1] 0.3369565
# ICC at the regional level 
ICC_r2 <- (0.014)/(0.155+0.291+0.014)
ICC_r2
## [1] 0.03043478
# ICC total 
ICC_t2 <- (0.15+0.014)/(0.15+0.014+0.291)
ICC_t2
## [1] 0.3604396

Hypotheses 3 and 4 and Models 3 and 4

This analysis employs a hierarchical linear model (model3) to examine how the interaction between cumulative EU immigration and regional economic disparity affects welfare state support. The model includes controls for gender, age, political placement, education, employment status, urban residence, and various regional and national socio-economic factors. Random effects are included for both country and regional levels. Results are presented with coefficients, standard errors, and significance levels.

This analysis uses a hierarchical linear model (model4) to explore how the interaction between cumulative EU immigration and individual economic insecurities influences welfare state support. The model controls for gender, age, political placement, education, employment status, urban residence, and various regional and national socio-economic factors. Random effects are included for both country and regional levels. Results are shown with coefficients, standard errors, and significance levels.

# Model 3 for Hypothesis 3
model3 <- lmer(indexwelfare_alesina_sbeqsoc_sbprvpv ~ 
                 log_EU_immigration_cumulative_4yr*regional_gdp_pps_eur_per_inhabitant_percentage_EU_average + 
                 gndr_dummyfemale + agea + lrscale + educ + unemployed_dummy + 
                 urban_dummy + trstprl +  regional_unemployment + regional_population_density +
                 regional_old_age_dependency + regional_foreign_born_population_log +
                 national_unemployment_level + national_social_protection +
                 national_corruption_perception + 
                 (1 | cntry) + (1 | cntry:nuts2),
               data = ess3_immigration, weights = pspwght)

# Model 4 for Hypothesis 4 
model4 <- lmer(indexwelfare_alesina_sbeqsoc_sbprvpv ~ 
                 log_EU_immigration_cumulative_4yr*lknemny +
                 gndr_dummyfemale + agea + lrscale + educ + unemployed_dummy + urban_dummy + 
                 trstprl +  regional_unemployment + regional_population_density +
                 regional_old_age_dependency + regional_foreign_born_population_log +
                 national_unemployment_level + national_social_protection +
                 national_corruption_perception + (1 | cntry) + (1 | cntry:nuts2),
                data = ess3_immigration, weights = pspwght)

# display the results
texreg::knitreg(list(model3, model4), 
          custom.model.names = c("Model 3: Welfare State Support", 
                                 "Model 4: Welfare State Support"), 
          caption = "Hierarchical Linear Model Regression Results for Hypotheses 3 and 4", 
          digits = 3, custom.coef.names = c("(Intercept)", 
                                            "Log of Cumulative EU Immigration (t-4)", 
                                            "Regional Economic Disparity", 
                                            "Gender: Female", "Age", 
                                            "Left-Right Political Placement", 
                                            "Education Level",
                                            "Employment Status: Unemployed", 
                                            "Place of Residence: Urban", 
                                            "Trust in National Parliament", 
                                            "Regional Unemployment Rate (20-64 years)", 
                                            "Regional Population Density", 
                                            "Regional Old Age Dependency",
                                            "Regional Foreign-born Population (log)", 
                                            "National Unemployment Rate (20-64 years)", 
                                            "National Social Protection Expenditure (% of GDP)", 
                                            "National Corruption Perception Index", 
                                            "Log of Cumulative EU Immigration (t-4) x Regional Economic Disparity", 
                                            "Individual Economic Insecurities", 
                                            "Log of Cumulative EU Immigration (t-4) x Individual Economic Insecurities"), 
          fit.headers = TRUE, 
          se = TRUE)
Hierarchical Linear Model Regression Results for Hypotheses 3 and 4
  Model 3: Welfare State Support Model 4: Welfare State Support
(Intercept) 0.861 1.060***
  (0.444) (0.299)
Log of Cumulative EU Immigration (t-4) -0.047 -0.073**
  (0.037) (0.027)
Regional Economic Disparity 0.002  
  (0.002)  
Gender: Female 0.046*** 0.041***
  (0.005) (0.005)
Age 0.001*** 0.001***
  (0.000) (0.000)
Left-Right Political Placement -0.053*** -0.054***
  (0.001) (0.001)
Education Level -0.015*** -0.014***
  (0.002) (0.002)
Employment Status: Unemployed 0.102** 0.058
  (0.035) (0.035)
Place of Residence: Urban 0.057*** 0.054***
  (0.008) (0.008)
Trust in National Parliament 0.013*** 0.016***
  (0.001) (0.001)
Regional Unemployment Rate (20-64 years) 0.007* 0.008**
  (0.003) (0.003)
Regional Population Density 0.000 -0.000
  (0.000) (0.000)
Regional Old Age Dependency -0.003* -0.003*
  (0.001) (0.001)
Regional Foreign-born Population (log) -0.022** -0.026***
  (0.008) (0.007)
National Unemployment Rate (20-64 years) -0.025*** -0.025***
  (0.005) (0.005)
National Social Protection Expenditure (% of GDP) 0.012*** 0.012***
  (0.003) (0.003)
National Corruption Perception Index 0.001 0.001
  (0.002) (0.002)
Log of Cumulative EU Immigration (t-4) x Regional Economic Disparity -0.000  
  (0.000)  
Individual Economic Insecurities   -0.054
    (0.030)
Log of Cumulative EU Immigration (t-4) x Individual Economic Insecurities   0.007**
    (0.002)
AIC 63286.671 60445.583
BIC 63466.686 60625.085
Log Likelihood -31622.336 -30201.791
Num. obs. 39032 38091
Num. groups: cntry:nuts2 160 151
Num. groups: cntry 17 16
Var: cntry:nuts2 (Intercept) 0.007 0.006
Var: cntry (Intercept) 0.038 0.039
Var: Residual 0.256 0.248
***p < 0.001; **p < 0.01; *p < 0.05
# ICC at the country level 
ICC_c3 <- (0.038)/(0.038+0.007+0.256)
ICC_c3
## [1] 0.1262458
# ICC at the regional level 
ICC_r3 <- (0.007)/(0.038+0.007+0.256)
ICC_r3
## [1] 0.02325581
# ICC total 
ICC_t3 <- (0.007+0.038)/(0.038+0.007+0.256)
ICC_t3
## [1] 0.1495017
# ICC at the country level 
ICC_c4 <- (0.039)/(0.039+0.006+0.248)
ICC_c4
## [1] 0.1331058
# ICC at the regional level 
ICC_r4 <- (0.006)/(0.039+0.006+0.248)
ICC_r4
## [1] 0.02047782
# ICC total 
ICC_t4 <- (0.039+0.006)/(0.039+0.006+0.248)
ICC_t4
## [1] 0.1535836

Hypotheses 5 and 6 and Models 5 and 6

This analysis employs a hierarchical linear model (model5) to assess the interaction between cumulative EU emigration and regional economic disparity on welfare state support. The model includes controls for gender, age, political placement, education, employment status, urban residence, and various regional and national socio-economic factors. Random effects are included for both country and regional levels. Results are presented with coefficients, standard errors, and significance levels.

This analysis uses a hierarchical linear model (model6) to investigate how the interaction between cumulative EU emigration and individual economic insecurities affects welfare state support. The model accounts for gender, age, political placement, education, employment status, urban residence, and various regional and national socio-economic factors. Random effects are included for both country and regional levels. Results are provided with coefficients, standard errors, and significance levels.

# Model 5 for Hypothesis 5 
model5 <- lmer(indexwelfare_alesina_sbeqsoc_sbprvpv ~ 
                 log_EU_emigration_cumulative_4yr*regional_gdp_pps_eur_per_inhabitant_percentage_EU_average+ 
                 gndr_dummyfemale + agea + lrscale + educ + 
                 unemployed_dummy + urban_dummy + trstprl +  
                 regional_unemployment + regional_population_density + 
                 regional_old_age_dependency +
                 regional_foreign_born_population_log + national_unemployment_level +
                 national_social_protection + national_corruption_perception + 
                 (1 | cntry) + (1 | cntry:nuts2),data = ess3_emigration, weights = pspwght)

# Model 6 for Hypothesis 6 
model6 <- lmer(indexwelfare_alesina_sbeqsoc_sbprvpv ~ 
                 log_EU_emigration_cumulative_4yr*lknemny +
                 gndr_dummyfemale + agea + lrscale + educ + unemployed_dummy + 
                 urban_dummy + 
                 trstprl +  regional_unemployment + regional_population_density +
                 regional_old_age_dependency + regional_foreign_born_population_log +
                 national_unemployment_level + 
                 national_social_protection +
                 national_corruption_perception + (1 | cntry) + (1 | cntry:nuts2), 
               data = ess3_emigration, weights = pspwght)

# display the results
texreg::knitreg(list(model5, model6), 
          custom.model.names = c("Model 5: Welfare State Support", 
                                 "Model 6: Welfare State Support"), 
          caption = "Hierarchical Linear Model Regression Results for Hypotheses 5 and 6", 
          digits = 3, custom.coef.names = c("(Intercept)", 
                                            "Log of Cumulative EU Emigration (t-4)",
                                            "Regional Economic Disparity", "Gender: Female", 
                                            "Age", 
                                            "Left-Right Political Placement", 
                                            "Education Level",
                                            "Employment Status: Unemployed", 
                                            "Place of Residence: Urban",
                                            "Trust in National Parliament", 
                                            "Regional Unemployment Rate (20-64 years)", 
                                            "Regional Population Density", 
                                            "Regional Old Age Dependency", 
                                            "Regional Foreign-born Population (log)", 
                                            "National Unemployment Rate (20-64 years)", 
                                            "National Social Protection Expenditure (% of GDP)", 
                                            "National Corruption Perception Index", 
                                            "Log of Cumulative EU Emigration (t-4) x Regional Economic Disparity", 
                                            "Individual Economic Insecurities", 
                                            "Log of Cumulative EU Emigration (t-4) x Individual Economic Insecurities"),
          fit.headers = TRUE, 
          se = TRUE)
Hierarchical Linear Model Regression Results for Hypotheses 5 and 6
  Model 5: Welfare State Support Model 6: Welfare State Support
(Intercept) -1.182 -1.141*
  (0.683) (0.515)
Log of Cumulative EU Emigration (t-4) 0.126** 0.100***
  (0.040) (0.027)
Regional Economic Disparity 0.001  
  (0.004)  
Gender: Female 0.058*** 0.057***
  (0.008) (0.008)
Age 0.002*** 0.002***
  (0.000) (0.000)
Left-Right Political Placement -0.012*** -0.012***
  (0.002) (0.002)
Education Level -0.036*** -0.031***
  (0.003) (0.003)
Employment Status: Unemployed 0.080 0.052
  (0.045) (0.045)
Place of Residence: Urban -0.031** -0.032**
  (0.010) (0.010)
Trust in National Parliament -0.002 -0.001
  (0.002) (0.002)
Regional Unemployment Rate (20-64 years) 0.002 0.005
  (0.007) (0.004)
Regional Population Density 0.000 0.000*
  (0.000) (0.000)
Regional Old Age Dependency 0.002 0.003
  (0.004) (0.004)
Regional Foreign-born Population (log) 0.017 0.016
  (0.016) (0.015)
National Unemployment Rate (20-64 years) -0.061** -0.054**
  (0.021) (0.019)
National Social Protection Expenditure (% of GDP) -0.019 -0.014
  (0.019) (0.017)
National Corruption Perception Index 0.010* 0.008
  (0.005) (0.004)
Log of Cumulative EU Emigration (t-4) x Regional Economic Disparity -0.000  
  (0.000)  
Individual Economic Insecurities   0.063*
    (0.031)
Log of Cumulative EU Emigration (t-4) x Individual Economic Insecurities   -0.002
    (0.003)
AIC 33796.085 32206.097
BIC 33961.688 32370.799
Log Likelihood -16877.042 -16082.048
Num. obs. 19651 18826
Num. groups: cntry:nuts2 87 87
Num. groups: cntry 11 11
Var: cntry:nuts2 (Intercept) 0.015 0.014
Var: cntry (Intercept) 0.158 0.107
Var: Residual 0.291 0.289
***p < 0.001; **p < 0.01; *p < 0.05
ICC_c5 <- (0.158)/(0.158+0.015+0.291)
ICC_c5
## [1] 0.3405172
# ICC at the regional level 
ICC_r5 <- (0.015)/(0.158+0.015+0.291)
ICC_r5
## [1] 0.03232759
# ICC total 
ICC_t5 <- (0.158+0.015)/(0.158+0.015+0.291)
ICC_t5
## [1] 0.3728448
ICC_c6 <- (0.11)/(0.11+0.02+0.29)
ICC_c6
## [1] 0.2619048
# ICC at the regional level 
ICC_r6 <- (0.01)/(0.11+0.01+0.29)
ICC_r6
## [1] 0.02439024
# ICC total 
ICC_t6 <- (0.11+0.01)/(0.11+0.01+0.29)
ICC_t6
## [1] 0.2926829

8.3 Robustness Checks

Regression Diagnostics

In this step, I perform essential diagnostic checks on the regression models to assess their validity and ensure that the assumptions underlying the regression analysis are met. This includes evaluating multicollinearity, the normality of residuals, homoscedasticity, leverage points, and the distribution of the dependent variable. The results indicate issues with heteroscedasticity, non-normality of residuals, and the non-normal distribution of the dependent variable. Additionally, Model 2 shows signs of multicollinearity among certain predictors. The subsequent chapter will apply generalised linear regression models to address these issues.

### model 1 ###

# Check for multicolinarity #-> there are no high scores, indicating that there is no significant multicollinearity 
vif(model1) 
##    log_EU_immigration_cumulative_4yr                     gndr_dummyfemale 
##                             1.803366                             1.005136 
##                                 agea                              lrscale 
##                             1.018617                             1.008938 
##                                 educ                     unemployed_dummy 
##                             1.033587                             1.003454 
##                          urban_dummy                              trstprl 
##                             1.028875                             1.030604 
##                regional_unemployment          regional_population_density 
##                             1.849164                             1.318981 
##          regional_old_age_dependency regional_foreign_born_population_log 
##                             1.520602                             1.323818 
##          national_unemployment_level           national_social_protection 
##                             2.331554                             2.337923 
##       national_corruption_perception 
##                             1.446101
# Are the residuals normally distributed? -> maybe not entirely 
qqnorm(resid(model1))
qqline(resid(model1))

# Is there homoscedacity? -> there is some potential heteroscedacity  
residuals <- residuals(model1)
fitted <- fitted(model1)

ggplot(data = NULL, aes(x = fitted, y = residuals)) +
  geom_point() +
  geom_smooth(method = "loess", color = "blue") +
  geom_hline(yintercept = 0, color = "red") +
  labs(x = "Fitted values", y = "Residuals") +
  ggtitle("Residuals vs Fitted Values - Heteroscedasticity")

# do i have high leverage point? 
outliers <- hatvalues(model1) > 3 * mean(hatvalues(model1))
ess3_immigration_clean <- ess3_immigration[!outliers, ]

# is my dependent variable normally distributed? Kolmogorov-Smirnov Test
ks.test(ess3_immigration$indexwelfare_alesina_sbeqsoc_sbprvpv, "pnorm", mean(ess3_immigration$indexwelfare_alesina_sbeqsoc_sbprvpv), sd(ess3_immigration$indexwelfare_alesina_sbeqsoc_sbprvpv))
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  ess3_immigration$indexwelfare_alesina_sbeqsoc_sbprvpv
## D = 0.032829, p-value < 2.2e-16
## alternative hypothesis: two-sided
hist(ess3_immigration$indexwelfare_alesina_sbeqsoc_sbprvpv,
     main = "Histogram of Dependent Variable",
     xlab = "Index Welfare",
     col = "lightblue",
     border = "black")

# Given that the p-value is extremely small, this suggests that the dependent variable does not follow a normal distribution.


### model 2 ###
# Check for multicolinarity #-> there are some high scores, indicating that there may be significant multicollinearity, especially for the variables national_unemployment_level and national_corruption_perception
vif(model2) 
##     log_EU_emigration_cumulative_4yr                     gndr_dummyfemale 
##                            16.695845                             1.011168 
##                                 agea                              lrscale 
##                             1.035028                             1.009853 
##                                 educ                     unemployed_dummy 
##                             1.059681                             1.004039 
##                          urban_dummy                              trstprl 
##                             1.022981                             1.019741 
##                regional_unemployment          regional_population_density 
##                             2.265820                             1.057180 
##          regional_old_age_dependency regional_foreign_born_population_log 
##                             3.390694                             3.008050 
##          national_unemployment_level           national_social_protection 
##                            24.813773                            10.371098 
##       national_corruption_perception 
##                            11.349394
# Are the residuals normally distributed? -> maybe not entirely 
qqnorm(resid(model2))
qqline(resid(model2))

# Is there homoscedacity? -> there is some potential heteroscedacity  
residuals <- residuals(model2)
fitted <- fitted(model2)

ggplot(data = NULL, aes(x = fitted, y = residuals)) +
  geom_point() +
  geom_smooth(method = "loess", color = "blue") +
  geom_hline(yintercept = 0, color = "red") +
  labs(x = "Fitted values", y = "Residuals") +
  ggtitle("Residuals vs Fitted Values - Heteroscedasticity")

# do i have high leverage point? 
outliers <- hatvalues(model2) > 3 * mean(hatvalues(model2))
ess3_emigration_clean <- ess3_emigration[!outliers, ]

# is my dependent variable normally distributed? Kolmogorov-Smirnov Test -> the result indicates that my dependent variable is not normally distributed 
ks.test(ess3_emigration$indexwelfare_alesina_sbeqsoc_sbprvpv, "pnorm", mean(ess3_emigration$indexwelfare_alesina_sbeqsoc_sbprvpv), sd(ess3_emigration$indexwelfare_alesina_sbeqsoc_sbprvpv))
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  ess3_emigration$indexwelfare_alesina_sbeqsoc_sbprvpv
## D = 0.038769, p-value < 2.2e-16
## alternative hypothesis: two-sided
hist(ess3_emigration$indexwelfare_alesina_sbeqsoc_sbprvpv,
     main = "Histogram of Dependent Variable",
     xlab = "Index Welfare",
     col = "lightblue",
     border = "black")

Applying Generalised Linear Models for Hypotheses 1-6

Given that my dependent variable is negatively skewed and exhibits issues with heteroscedasticity and non-normally distributed residuals, I will apply a generalised linear regression model. This approach is particularly suitable for continuous data that may not strictly follow a positive distribution. Additionally, I will use the cleaned datasets, excluding outliers, to ensure the robustness and accuracy of the model results.

#### Immigration 

# Model 1 
model1_glmm <- glmmTMB(indexwelfare_alesina_sbeqsoc_sbprvpv ~ 
                         log_EU_immigration_cumulative_4yr +
                         gndr_dummyfemale + agea + lrscale + educ + unemployed_dummy +
                         urban_dummy +
                         trstprl + regional_unemployment + 
                         regional_population_density +
                         regional_old_age_dependency + 
                         regional_foreign_born_population_log +
                         national_unemployment_level + 
                         national_social_protection +
                         national_corruption_perception + (1 | cntry) + (1 | cntry:nuts2),
                      data = ess3_immigration_clean, 
                      family = gaussian())

# Model 3 
model3_glmm <- glmmTMB(indexwelfare_alesina_sbeqsoc_sbprvpv ~ 
                         log_EU_immigration_cumulative_4yr *regional_gdp_pps_eur_per_inhabitant_percentage_EU_average +
                         gndr_dummyfemale + agea + lrscale + 
                         educ + unemployed_dummy + urban_dummy + 
                         trstprl +  regional_unemployment + 
                         regional_population_density +
                         regional_old_age_dependency + 
                         regional_foreign_born_population_log +
                         national_unemployment_level + 
                         national_social_protection +
                         national_corruption_perception + (1 | cntry) + (1 | cntry:nuts2),
                data = ess3_immigration_clean, weights = pspwght, 
                family = gaussian())

# Model 4
model4_glmm <- glmmTMB(indexwelfare_alesina_sbeqsoc_sbprvpv ~ 
                         log_EU_immigration_cumulative_4yr*lknemny +
                         gndr_dummyfemale + agea + lrscale + educ + 
                         unemployed_dummy + 
                         urban_dummy + trstprl +  
                         regional_unemployment + 
                         regional_population_density + 
                         regional_old_age_dependency +
                         regional_foreign_born_population_log + 
                         national_unemployment_level +
                         national_social_protection + 
                         national_corruption_perception + (1 | cntry) +
                         (1 | cntry:nuts2),
                data = ess3_immigration_clean, weights = pspwght, 
                family = gaussian())

#### Emigration 

# Model 2
model2_glmm <- glmmTMB(indexwelfare_alesina_sbeqsoc_sbprvpv ~ 
                         log_EU_emigration_cumulative_4yr +
                         gndr_dummyfemale + agea + 
                         lrscale + educ + 
                         unemployed_dummy + urban_dummy + trstprl + 
                         regional_unemployment + regional_population_density + 
                         regional_old_age_dependency + 
                         regional_foreign_born_population_log +
                         national_unemployment_level + 
                         national_social_protection +
                         national_corruption_perception + (1 | cntry) + (1 | cntry:nuts2),
                data = ess3_emigration_clean, 
                weights = pspwght, family = gaussian())

# Model 5 
model5_glmm <- glmmTMB(indexwelfare_alesina_sbeqsoc_sbprvpv ~ 
                         log_EU_emigration_cumulative_4yr*regional_gdp_pps_eur_per_inhabitant_percentage_EU_average + 
                         gndr_dummyfemale + agea + lrscale + 
                         educ + unemployed_dummy + urban_dummy + 
                         trstprl +  regional_unemployment +
                         regional_population_density + 
                         regional_old_age_dependency +
                         regional_foreign_born_population_log + 
                         national_unemployment_level +
                         national_social_protection + 
                         national_corruption_perception +
                         (1 | cntry) + (1 | cntry:nuts2), 
                       data = ess3_emigration_clean, 
                       weights = pspwght, family = gaussian())

# Model 6 
model6_glmm <- glmmTMB(indexwelfare_alesina_sbeqsoc_sbprvpv ~ 
                         log_EU_emigration_cumulative_4yr*lknemny +
                         gndr_dummyfemale + agea + lrscale + educ + 
                         unemployed_dummy + urban_dummy +
                         trstprl +  regional_unemployment +
                         regional_population_density +
                         regional_old_age_dependency + 
                         regional_foreign_born_population_log +
                         national_unemployment_level + 
                         national_social_protection +
                         national_corruption_perception + (1 | cntry) + (1 | cntry:nuts2), 
                       data = ess3_emigration_clean, 
                       weights = pspwght, family = gaussian())
dispersion_estimate_model1 <- summary(model1_glmm)$sigma^2
dispersion_estimate_model3 <- summary(model3_glmm)$sigma^2
dispersion_estimate_model4 <- summary(model4_glmm)$sigma^2


# display the results
texreg::knitreg(
  list(model1_glmm, model3_glmm, model4_glmm), 
  custom.model.names = c("Model 1: Welfare State Support", 
                         "Model 3: Welfare State Support",
                         "Model 4: Welfare State Support"), 
  caption = "Generalised Linear Mixed Model Regression Results for Hypotheses 1, 3 and 4", 
  digits = 3, 
  custom.coef.names = c("(Intercept)", 
                        "Log of Cumulative EU Immigration (t-4)",
                        "Gender: Female", "Age", 
                        "Left-Right Political Placement",
                        "Education Level", 
                        "Employment Status: Unemployed", 
                        "Place of Residence: Urban", 
                        "Trust in National Parliament",
                        "Regional Unemployment Rate (20-64 years)", 
                        "Regional Population Density", 
                        "Regional Old Age Dependency", 
                        "Regional Foreign-born Population (log)", 
                        "National Unemployment Rate (20-64 years)", 
                        "National Social Protection Expenditure (% of GDP)",
                        "National Corruption Perception Index", 
                        "Regional Economic Disparity", 
                        "Log of Cumulative EU Immigration (t-4) x Regional Economic Disparity", 
                        "Individual Economic Insecurities", 
                        "Log of Cumulative EU Immigration (t-4) x Individual Economic Insecurities"),
  fit.headers = TRUE, 
  se = TRUE
)
Generalised Linear Mixed Model Regression Results for Hypotheses 1, 3 and 4
  Model 1: Welfare State Support Model 3: Welfare State Support Model 4: Welfare State Support
(Intercept) 1.241*** 0.732 1.012***
  (0.327) (0.449) (0.305)
Log of Cumulative EU Immigration (t-4) -0.085** -0.040 -0.073**
  (0.029) (0.037) (0.027)
Gender: Female 0.048*** 0.046*** 0.040***
  (0.005) (0.005) (0.005)
Age 0.001*** 0.001*** 0.001***
  (0.000) (0.000) (0.000)
Left-Right Political Placement -0.055*** -0.054*** -0.056***
  (0.001) (0.001) (0.001)
Education Level -0.018*** -0.015*** -0.014***
  (0.002) (0.002) (0.002)
Employment Status: Unemployed 0.157*** 0.149** 0.123**
  (0.042) (0.047) (0.046)
Place of Residence: Urban 0.051*** 0.055*** 0.055***
  (0.008) (0.008) (0.008)
Trust in National Parliament 0.013*** 0.014*** 0.017***
  (0.001) (0.001) (0.001)
Regional Unemployment Rate (20-64 years) 0.013*** 0.011** 0.011***
  (0.003) (0.003) (0.003)
Regional Population Density -0.000 0.000 -0.000
  (0.000) (0.000) (0.000)
Regional Old Age Dependency -0.003* -0.003* -0.003*
  (0.001) (0.001) (0.001)
Regional Foreign-born Population (log) -0.022** -0.020* -0.023***
  (0.007) (0.008) (0.006)
National Unemployment Rate (20-64 years) -0.028*** -0.027*** -0.027***
  (0.005) (0.005) (0.005)
National Social Protection Expenditure (% of GDP) 0.014*** 0.012*** 0.012***
  (0.003) (0.003) (0.003)
National Corruption Perception Index 0.001 0.001 0.002
  (0.002) (0.002) (0.002)
Regional Economic Disparity   0.003  
    (0.002)  
Log of Cumulative EU Immigration (t-4) x Regional Economic Disparity   -0.000  
    (0.000)  
Individual Economic Insecurities     -0.056
      (0.032)
Log of Cumulative EU Immigration (t-4) x Individual Economic Insecurities     0.008**
      (0.003)
AIC 58775.883 54082.782 52126.279
Log Likelihood -29368.941 -27020.391 -26042.140
Num. obs. 39074 37571 36784
Num. groups: cntry 17 17 16
Num. groups: cntry:nuts2 152 152 143
Var: cntry (Intercept) 0.037 0.033 0.035
Var: cntry:nuts2 (Intercept) 0.006 0.005 0.003
***p < 0.001; **p < 0.01; *p < 0.05
# Consider models 2, 5 and 6 
dispersion_estimate_model2 <- summary(model1_glmm)$sigma^2
dispersion_estimate_model5 <- summary(model3_glmm)$sigma^2
dispersion_estimate_model6 <- summary(model4_glmm)$sigma^2


# display the results
texreg::htmlreg(list(model2_glmm, model5_glmm, model6_glmm), 
          custom.model.names = c("Model 2: Welfare State Support", 
                                 "Model 5: Welfare State Support",
                                 "Model 6: Welfare State Support"), 
          caption = "Generalised Linear Mixed Model Regression Results for Hypotheses 2, 5 and 6", 
          digits = 3, custom.coef.names = c("(Intercept)", 
                                            "Log of Cumulative EU Emigration (t-4)", 
                                            "Gender: Female", "Age", 
                                            "Left-Right Political Placement",
                                            "Education Level", 
                                            "Employment Status: Unemployed", 
                                            "Place of Residence: Urban", "Trust in National Parliament",
                                            "Regional Unemployment Rate (20-64 years)", 
                                            "Regional Population Density", 
                                            "Regional Old Age Dependency",
                                            "Regional Foreign-born Population (log)", 
                                            "National Unemployment Rate (20-64 years)", 
                                            "National Social Protection Expenditure (% of GDP)", 
                                            "National Corruption Perception Index", 
                                            "Regional Economic Disparity", 
                                            "Log of Cumulative EU Emigration (t-4) x Regional Economic Disparity", 
                                            "Individual Economic Insecurities", 
                                            "Log of Cumulative EU Emigration (t-4) x Individual Economic Insecurities")
)
Generalised Linear Mixed Model Regression Results for Hypotheses 2, 5 and 6
  Model 2: Welfare State Support Model 5: Welfare State Support Model 6: Welfare State Support
(Intercept) -0.854 -0.653 -0.393
  (0.659) (0.865) (0.650)
Log of Cumulative EU Emigration (t-4) 0.099* 0.086 0.038
  (0.048) (0.066) (0.052)
Gender: Female 0.057*** 0.057*** 0.057***
  (0.008) (0.008) (0.008)
Age 0.002*** 0.002*** 0.002***
  (0.000) (0.000) (0.000)
Left-Right Political Placement -0.011*** -0.011*** -0.010***
  (0.002) (0.002) (0.002)
Education Level -0.036*** -0.036*** -0.031***
  (0.003) (0.003) (0.004)
Employment Status: Unemployed 0.132* 0.132* 0.108
  (0.056) (0.056) (0.056)
Place of Residence: Urban -0.028** -0.028** -0.031**
  (0.010) (0.010) (0.010)
Trust in National Parliament -0.002 -0.002 -0.001
  (0.002) (0.002) (0.002)
Regional Unemployment Rate (20-64 years) 0.001 -0.000 0.000
  (0.006) (0.008) (0.005)
Regional Population Density 0.000 0.000 0.000
  (0.000) (0.000) (0.000)
Regional Old Age Dependency 0.001 -0.000 0.001
  (0.005) (0.005) (0.004)
Regional Foreign-born Population (log) 0.010 0.011 0.014
  (0.023) (0.023) (0.017)
National Unemployment Rate (20-64 years) -0.051 -0.046 -0.014
  (0.027) (0.030) (0.031)
National Social Protection Expenditure (% of GDP) -0.019 -0.016 -0.008
  (0.018) (0.018) (0.011)
National Corruption Perception Index 0.009 0.008 -0.000
  (0.007) (0.008) (0.007)
Regional Economic Disparity   -0.001  
    (0.004)  
Log of Cumulative EU Emigration (t-4) x Regional Economic Disparity   0.000  
    (0.000)  
Individual Economic Insecurities     0.073*
      (0.032)
Log of Cumulative EU Emigration (t-4) x Individual Economic Insecurities     -0.003
      (0.003)
AIC 29562.119 29565.923 28220.812
Log Likelihood -14762.060 -14761.962 -14089.406
Num. obs. 18689 18689 17913
Num. groups: cntry 11 11 11
Num. groups: cntry:nuts2 73 73 73
Var: cntry (Intercept) 0.112 0.101 0.022
Var: cntry:nuts2 (Intercept) 0.012 0.012 0.011
***p < 0.001; **p < 0.01; *p < 0.05

Measuring the Extent of EU Immigration at the Regional Level: Proportion of Foreign-born Individuals from Other EU Countries

This code prepares and analyses data to explore the relationship between the regional share of EU-born immigrants and support for the welfare state. First, the data is filtered to include only regions affected by immigration. Then, any missing values in key columns are removed to ensure that the analysis is based on complete data. The code proceeds by estimating a generalised linear mixed model (GLMM), where the dependent variable is welfare state support, and the main independent variable is the regional share of EU-born immigrants, along with the other control variables. The model also accounts for random effects at the country and regional levels.

# Create a new dataset based on the 'ess3' dataset
ess3_regional <- ess3

# Subset data to include only regions with a positive net EU migration rate
ess3_immigration_regional <- subset(ess3_regional, immigration_affected == 1)

# Define columns to check for missing values before analysis
columns_to_check <- c("gndr_dummyfemale", "lrscale", "agea", "unemployed_dummy",
                      "educ", "urban_dummy", "national_unemployment_level",
                      "national_social_protection", "regional_population_density",
                      "regional_old_age_dependency", "regional_foreign_born_population",
                      "regional_unemployment", "national_corruption_perception",
                      "indexwelfare_alesina_sbeqsoc_sbprvpv", "trstprl")

# Remove rows with any missing values in specified columns to ensure complete data for modeling
ess3_immigration_regional <- ess3_immigration_regional %>%
  filter(across(all_of(columns_to_check), ~ !is.na(.)))

# Estimate the generalised linear mixed model using the regional_foreign_born_EU_share variable
model1_regional <- glmmTMB(indexwelfare_alesina_sbeqsoc_sbprvpv ~ regional_foreign_born_EU_share +
                             gndr_dummyfemale + agea + lrscale + educ + unemployed_dummy + 
                             urban_dummy + trstprl +  regional_unemployment + 
                             regional_population_density + regional_old_age_dependency +
                             regional_foreign_born_nonEU_share + national_unemployment_level +
                             national_social_protection + 
                             national_corruption_perception + (1 | cntry) + (1 | cntry:nuts2),
                        data = ess3_immigration_regional, 
                        weights = pspwght, family = gaussian())

# calculate the dispersion estimate 
dispersion_estimate_model1_regional <- summary(model1_regional)$sigma^2

# Create a custom note to display in the regression output, including p-value significance levels and dispersion estimate
custom_note_text <- paste(
  "Standard errors are in parentheses. * p < 0.05, ** p < 0.01, *** p < 0.001.",
  "Dispersion estimate (σ²) for Gaussian family: Model 1 = ", 
  round(dispersion_estimate_model1_regional, 3),
  sep = "\n"
)

# display the resuts 
texreg::knitreg(list(model1_regional), 
          custom.model.names = c("Model 1: Welfare State Support"), 
          caption = "Generalised Linear Mixed Model Regression Results for Hypothesis 1 using the regional share of EU-born immigrants", 
          digits = 3, custom.coef.names = c("(Intercept)", "Regional share of EU-born immigrants", 
                                            "Gender: Female", 
                                            "Age", "Left-Right Political Placement",
                                            "Education Level", 
                                            "Employment Status: Unemployed", 
                                            "Place of Residence: Urban", 
                                            "Trust in National Parliament",
                                            "Regional Unemployment Rate (20-64 years)", 
                                            "Regional Population Density", 
                                            "Regional Old Age Dependency",
                                            "Regional share of non-EU-born immigrants", 
                                            "National Unemployment Rate (20-64 years)", 
                                            "National Social Protection Expenditure (% of GDP)", 
                                            "National Corruption Perception Index"), 
          fit.headers = TRUE, 
          se = TRUE)
Generalised Linear Mixed Model Regression Results for Hypothesis 1 using the regional share of EU-born immigrants
  Model 1: Welfare State Support
(Intercept) 0.149
  (0.084)
Regional share of EU-born immigrants -0.740**
  (0.230)
Gender: Female 0.045***
  (0.005)
Age 0.001***
  (0.000)
Left-Right Political Placement -0.054***
  (0.001)
Education Level -0.016***
  (0.002)
Employment Status: Unemployed 0.106**
  (0.035)
Place of Residence: Urban 0.057***
  (0.008)
Trust in National Parliament 0.013***
  (0.001)
Regional Unemployment Rate (20-64 years) 0.009**
  (0.003)
Regional Population Density 0.000
  (0.000)
Regional Old Age Dependency -0.003*
  (0.001)
Regional share of non-EU-born immigrants -0.291
  (0.254)
National Unemployment Rate (20-64 years) -0.019***
  (0.005)
National Social Protection Expenditure (% of GDP) 0.007**
  (0.002)
National Corruption Perception Index 0.001
  (0.002)
AIC 60246.769
Log Likelihood -30104.384
Num. obs. 40546
Num. groups: cntry 17
Num. groups: cntry:nuts2 160
Var: cntry (Intercept) 0.030
Var: cntry:nuts2 (Intercept) 0.007
***p < 0.001; **p < 0.01; *p < 0.05

Measuring the Scale of EU Emigration at the Regional Level: Analysing Regional Net Migration Patterns

This code focuses on analysing the relationship between regional emigration and welfare state support across different NUTS2 regions. The process begins by creating a dataset of regions with negative net EU migration, indicating those affected by emigration. For these regions, it calculates the extent of emigration based on the negative values of the ‘regional_net_migration’ variable, and then computes a cumulative 4-year sum of emigration extent.

A log-transformation is applied to the cumulative emigration data to stabilise variance. Using this transformed data, a generalised linear regression model (Model 2) is estimated to examine how regional cumulative emigration affects support for the welfare state,

# create a new dataset based on ess3
ess3_emigration_regional <- ess3

# Subset data to include only regions with a negative net EU migration rate
ess3_emigration_regional <- subset(ess3_emigration_regional, emigration_affected == 1)

# Define the extent of emigration based on negative net migration rates
# If 'regional_net_migration' is negative, assign the absolute value to 'emigration_extent'; otherwise, set to 0
ess3_emigration_regional <- ess3_emigration_regional %>%
  mutate(emigration_extent = ifelse(regional_net_migration < 0, -regional_net_migration, 0))

# Calculate a cumulative 4-year sum of 'emigration_extent' for each NUTS2 region
ess3_emigration_regional <- ess3_emigration_regional %>%
  group_by(nuts2) %>%
  arrange(ess_year) %>%
  mutate(
    emigration_extent_cumulative_4yr = rollapply(
      emigration_extent, 
      width = 4, 
      FUN = sum, 
      na.rm = TRUE, 
      fill = NA, 
      align = "right"
    )
  ) %>%
  ungroup()

# Log-transform the 4-year cumulative emigration extent to stabilise variance
ess3_emigration_regional$logemigration_extent_cumulative_4yr <- log(ess3_emigration_regional$emigration_extent_cumulative_4yr + 1)  # Adding 1 to avoid log(0)

# estimate the generalised linear regression model using the logemigration_extent_cumulative_4yr variable 
model2_regional <- glmmTMB(indexwelfare_alesina_sbeqsoc_sbprvpv ~ 
                             logemigration_extent_cumulative_4yr +
                             gndr_dummyfemale + agea + lrscale + 
                             educ + unemployed_dummy + urban_dummy + 
                             trstprl +  regional_unemployment +
                             regional_population_density +
                             regional_old_age_dependency + 
                             regional_foreign_born_nonEU_log +
                             national_unemployment_level + 
                             national_social_protection+
                             national_corruption_perception + (1 | cntry) + (1 | cntry:nuts2),
                           data = ess3_emigration_regional, 
                           weights = pspwght, family = gaussian())

# Calculate the dispersion estimate for the model
dispersion_estimate_model2_regional <- summary(model2_regional)$sigma^2

# Customise the note for model results with significance levels and dispersion estimate
custom_note_text <- paste(
  "Standard errors are in parentheses. * p < 0.05, ** p < 0.01, *** p < 0.001.",
  "Dispersion estimate (σ²) for Gaussian family: Model 2 = ", 
  round(dispersion_estimate_model2_regional, 3),
  sep = "\n"
)

# display the results 
texreg::knitreg(list(model2_regional), 
          custom.model.names = c("Model 2: Welfare State Support"), 
          caption = "Generalised Linear Mixed Model Regression Results for Hypothesis 2 using the regional ", 
          digits = 3, custom.coef.names = c("(Intercept)", 
                                            "Log of Cumulative Regional Emigration (t-4)",
                                            "Gender: Female", 
                                            "Age", "Left-Right Political Placement",
                                            "Education Level", 
                                            "Employment Status: Unemployed", 
                                            "Place of Residence: Urban", 
                                            "Trust in National Parliament",
                                            "Regional Unemployment Rate (20-64 years)", 
                                            "Regional Population Density", 
                                            "Regional Old Age Dependency",
                                            "Regional Foreign-born Population (log)", 
                                            "National Unemployment Rate (20-64 years)", 
                                            "National Social Protection Expenditure (% of GDP)", 
                                            "National Corruption Perception Index"), 
          fit.headers = TRUE, 
          se = TRUE)
Generalised Linear Mixed Model Regression Results for Hypothesis 2 using the regional
  Model 2: Welfare State Support
(Intercept) 0.187
  (0.210)
Log of Cumulative Regional Emigration (t-4) 0.007***
  (0.002)
Gender: Female 0.057***
  (0.008)
Age 0.002***
  (0.000)
Left-Right Political Placement -0.012***
  (0.002)
Education Level -0.036***
  (0.003)
Employment Status: Unemployed 0.071
  (0.045)
Place of Residence: Urban -0.030**
  (0.010)
Trust in National Parliament -0.003
  (0.002)
Regional Unemployment Rate (20-64 years) -0.000
  (0.005)
Regional Population Density 0.000*
  (0.000)
Regional Old Age Dependency 0.000
  (0.003)
Regional Foreign-born Population (log) 0.008
  (0.011)
National Unemployment Rate (20-64 years) 0.000
  (0.010)
National Social Protection Expenditure (% of GDP) -0.003
  (0.009)
National Corruption Perception Index -0.004*
  (0.002)
AIC 31203.669
Log Likelihood -15582.834
Num. obs. 19390
Num. groups: cntry 11
Num. groups: cntry:nuts2 87
Var: cntry (Intercept) 0.010
Var: cntry:nuts2 (Intercept) 0.014
***p < 0.001; **p < 0.01; *p < 0.05

Does a High Outflow of Citizens Indicate Lower Economic Prosperity in Certain Regions?

Does a higher level of emigration correlate with economic prosperity in certain regions? This code categorises regions on the basis of their cumulative emigration exposure over the last four years using quantiles, and then produces a box plot to examine how regional GDP per inhabitant correlates with these emigration categories. Regions are classified as having low, medium, high and very high emigration exposure, and missing data are removed. The resulting plot provides valuable insights into the potential economic impact of emigration on regional prosperity, and highlights any trends linking higher levels of emigration to variations in regional economic performance.

# specify cuts based on quantiles 
ess3_emigration_regional$emigration_category <- cut(ess3_emigration_regional$emigration_extent_cumulative_4yr,
                                                    breaks = c(-Inf, 0, 5396, 
                                                               29712, Inf),
                                                    labels = c("Low Exposure", 
                                                               "Moderate Exposure", 
                                                               "High Exposure", 
                                                               "Very High Exposure"),
                                                    right = TRUE)

# Remove observations wit missing values  
ess3_emigration_regional <- ess3_emigration_regional %>% drop_na(emigration_extent_cumulative_4yr)

# Create the plot 
prosperity_emigration <- ggplot(ess3_emigration_regional, aes(x = emigration_category, 
                                                              y = regional_gdp_pps_eur_per_inhabitant_percentage_EU_average)) +
  geom_boxplot() +
  labs(title = "Regional GDP per Inhabitant by Regional Cumulative Exposure to Emigration (t-4)",
       x = "Emigration",
       y = "GDP per Inhabitant (% of EU Average)") +
  theme_minimal() + theme(plot.title = element_text(face = "bold"))

# display the plot 
prosperity_emigration

Is a High Share of Foreign-born EU Citizens Linked to Greater economic Prosperity in Certain Regions?

Does a higher share of foreign-born EU citizens correlate with greater economic prosperity in certain regions? This code classifies regions according to their share of foreign-born EU citizens using quintiles and generates a box plot to explore how regional GDP per inhabitant relates to these exposure levels. By categorising regions as having low, medium, high and very high exposure to foreign-born EU citizens, the plot provides valuable insights into the potential economic impact of immigration on regional prosperity, highlighting any patterns of economic growth associated with higher proportions of immigrants.

# specify cuts based on quantiles 
ess3_immigration_regional$regional_foreign_born_EU_share <- cut(ess3_immigration_regional$regional_foreign_born_EU_share,
                                                    breaks = c(-Inf, 0.0100648, 
                                                               0.0220295, 
                                                               0.0447341, Inf),
                                                    labels = c("Low Exposure", 
                                                               "Moderate Exposure", 
                                                               "High Exposure", 
                                                               "Very High Exposure"),
                                                    right = TRUE)
# create the plot 
prosperity_immigration <- ggplot(ess3_immigration_regional, aes(x = regional_foreign_born_EU_share, 
                                      y = regional_gdp_pps_eur_per_inhabitant_percentage_EU_average)) + 
  geom_boxplot() +
  labs(title = "Regional GDP per Inhabitant by Regional Exposure to Foreign-born EU Immigrants",
       x = "Foreign-born EU Immigrants",
       y = "GDP per Inhabitant (% of EU Average)") +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

# display the plot 
prosperity_immigration

How Immigration Attitudes Affect the Statistical Empirical Results for Hypotheses 1, 3 and 4

How do attitudes toward immigration influence the relationship between freedom of movement and welfare support in the EU? In this step, the ESS variable imwbcnt (which measures perceptions of immigration’s impact on the country) is cleaned, rescaled, and integrated into hierarchical linear regression models 1, 3, and 4 to assess its effect on the welfare-state nexus within the context of EU immigration.

# create a new dataset based on ess3_immigration
ess3_attitudes_overall <- ess3_immigration

# immigration seen as an economic threat, variable imbgeco
# specify NA's and rescale the values 
ess3_attitudes_overall <- ess3_attitudes_overall %>%
  mutate(imbgeco = case_when(
    imbgeco %in% c(77, 88, 99) ~ NA_real_,
    TRUE ~ imbgeco
    ))

ess3_attitudes_overall$imbgeco_rescaled <- 11 - ess3_attitudes_overall$imbgeco

# immigration seen as a cultural threat, variable imueclt
# specify NAs and rescale the values 
ess3_attitudes_overall <- ess3_attitudes_overall %>%
  mutate(imueclt = case_when(
    imbgeco %in% c(77, 88, 99) ~ NA_real_,
    TRUE ~ imueclt
    ))

# immigration makes country better or worse, variable imwbcnt
# specify NAs and rescale the values 
ess3_attitudes_overall <- ess3_attitudes_overall %>%
  mutate(imwbcnt = case_when(
    imwbcnt %in% c(77, 88, 99) ~ NA_real_,
    TRUE ~ imwbcnt
    ))

ess3_attitudes_overall$imwbcnt_rescaled <- 11 - ess3_attitudes_overall$imwbcnt

ess3_attitudes_overall$imueclt_rescaled <- 11 - ess3_attitudes_overall$imueclt

# Make the variable imbgeco_rescaled numeric 
ess3_attitudes_overall$imbgeco_rescaled <- as.numeric(ess3_attitudes_overall$imbgeco_rescaled)

# make the variable lknemny numeric 
ess3_attitudes_overall$lknemny <- as.numeric(ess3_attitudes_overall$lknemny)

# specify model 1 with immigration attitudes, variable ess3_attitudes_overall
model1_attitudes <- lmer(indexwelfare_alesina_sbeqsoc_sbprvpv ~ 
                           log_EU_immigration_cumulative_4yr+imwbcnt_rescaled + 
                           gndr_dummyfemale + agea + 
                           lrscale + educ + unemployed_dummy + urban_dummy + 
                           trstprl +  regional_unemployment + 
                           regional_population_density +
                           regional_old_age_dependency + 
                           regional_foreign_born_population_log +
                           national_unemployment_level + 
                           national_social_protection +
                           national_corruption_perception + (1 | cntry) + (1 | cntry:nuts2),
                data = ess3_attitudes_overall, 
                weights = pspwght)

# specify model 3 with immigration attitudes, variable ess3_attitudes_overall
model3_attitudes <- lmer(indexwelfare_alesina_sbeqsoc_sbprvpv ~ 
                           log_EU_immigration_cumulative_4yr*regional_gdp_pps_eur_per_inhabitant_percentage_EU_average+imwbcnt_rescaled + 
                           gndr_dummyfemale + agea + lrscale + educ + 
                           unemployed_dummy + 
                           urban_dummy + trstprl +  regional_unemployment + 
                           regional_population_density +
                           regional_old_age_dependency + 
                           regional_foreign_born_population_log +
                           national_unemployment_level + 
                           national_social_protection +
                           national_corruption_perception + (1 | cntry) + (1 | cntry:nuts2),
                data = ess3_attitudes_overall, 
                weights = pspwght)

# specify model 4 with immigration attitudes, variable ess3_attitudes_overall
model4_attitudes <- lmer(indexwelfare_alesina_sbeqsoc_sbprvpv ~ 
                           log_EU_immigration_cumulative_4yr*lknemny+imwbcnt_rescaled + 
                           gndr_dummyfemale + agea + lrscale + educ + 
                           unemployed_dummy + 
                           urban_dummy + trstprl +
                           regional_unemployment + 
                           regional_population_density + 
                           regional_old_age_dependency +
                           regional_foreign_born_population_log + 
                           national_unemployment_level +
                           national_social_protection + 
                           national_corruption_perception +
                           (1 | cntry) + (1 | cntry:nuts2),
                data = ess3_attitudes_overall, 
                weights = pspwght)

# display the results of the generalised regression models 
texreg::knitreg(list(model1_attitudes, model3_attitudes, model4_attitudes), 
          custom.model.names = c("Model 1: Welfare State Support", 
                                 "Model 3: Welfare State Support", 
                                 "Model 4: Welfare State Support"), 
          caption = "Hierarchical Linear Model Regression Results for Hypotheses 1, 3 and 4 with Immigration Attitudes", 
          digits = 3, custom.coef.names = c("(Intercept)", 
                                            "Log of Cumulative EU Immigration (t-4)", 
                                            "Anti-Immigration Attitudes", 
                                            "Gender: Female", "Age", 
                                            "Left-Right Political Placement", 
                                            "Education Level",
                                            "Employment Status: Unemployed", 
                                            "Place of Residence: Urban",
                                            "Trust in National Parliament", 
                                            "Regional Unemployment Rate (20-64 years)", 
                                            "Regional Population Density", 
                                            "Regional Old Age Dependency",
                                            "Regional Foreign-born Population (log)", 
                                            "National Unemployment Rate (20-64 years)", 
                                            "National Social Protection Expenditure (% of GDP)", 
                                            "National Corruption Perception Index", 
                                            "Regional Economic Disparities ", 
                                            "Log of Cumulative EU Immigration (t-4) x Regional Economic Disparities", 
                                            "Individual Economic Insecurities", 
                                            "Log of Cumulative EU Immigration (t-4) x Individual Economic Insecurities"), 
          custom.note = "Standard errors are in parentheses. * p < 0.05, ** p < 0.01, *** p < 0.001.",
          fit.headers = TRUE, 
          se = TRUE)
Hierarchical Linear Model Regression Results for Hypotheses 1, 3 and 4 with Immigration Attitudes
  Model 1: Welfare State Support Model 3: Welfare State Support Model 4: Welfare State Support
(Intercept) 1.378*** 1.137* 1.340***
  (0.303) (0.454) (0.304)
Log of Cumulative EU Immigration (t-4) -0.088** -0.067 -0.094***
  (0.027) (0.038) (0.027)
Anti-Immigration Attitudes -0.013*** -0.013*** -0.015***
  (0.001) (0.001) (0.001)
Gender: Female 0.044*** 0.045*** 0.040***
  (0.005) (0.005) (0.005)
Age 0.001*** 0.001*** 0.001***
  (0.000) (0.000) (0.000)
Left-Right Political Placement -0.053*** -0.052*** -0.052***
  (0.001) (0.001) (0.001)
Education Level -0.018*** -0.017*** -0.017***
  (0.002) (0.002) (0.002)
Employment Status: Unemployed 0.100** 0.097** 0.050
  (0.035) (0.035) (0.035)
Place of Residence: Urban 0.055*** 0.054*** 0.052***
  (0.008) (0.008) (0.008)
Trust in National Parliament 0.010*** 0.010*** 0.013***
  (0.001) (0.001) (0.001)
Regional Unemployment Rate (20-64 years) 0.010*** 0.008* 0.009**
  (0.003) (0.003) (0.003)
Regional Population Density -0.000 0.000 -0.000
  (0.000) (0.000) (0.000)
Regional Old Age Dependency -0.002 -0.002 -0.002
  (0.001) (0.001) (0.001)
Regional Foreign-born Population (log) -0.025*** -0.020* -0.023**
  (0.007) (0.008) (0.007)
National Unemployment Rate (20-64 years) -0.027*** -0.028*** -0.028***
  (0.005) (0.005) (0.005)
National Social Protection Expenditure (% of GDP) 0.013*** 0.013*** 0.013***
  (0.003) (0.003) (0.003)
National Corruption Perception Index 0.001 0.001 0.002
  (0.002) (0.002) (0.002)
Regional Economic Disparities   0.003  
    (0.002)  
Log of Cumulative EU Immigration (t-4) x Regional Economic Disparities   -0.000  
    (0.000)  
Individual Economic Insecurities     -0.057
      (0.031)
Log of Cumulative EU Immigration (t-4) x Individual Economic Insecurities     0.008**
      (0.002)
AIC 64808.031 62339.698 59544.127
BIC 64979.959 62527.989 59731.901
Log Likelihood -32384.015 -31147.849 -29750.064
Num. obs. 39992 38511 37616
Num. groups: cntry:nuts2 160 160 151
Num. groups: cntry 17 17 16
Var: cntry:nuts2 (Intercept) 0.007 0.007 0.006
Var: cntry (Intercept) 0.041 0.043 0.042
Var: Residual 0.256 0.255 0.247
Standard errors are in parentheses. * p < 0.05, ** p < 0.01, *** p < 0.001.

9. Graphs and Tables

9.1 Included regions and countries

Initial Selection and Duplication:

  • Selected columns (nuts2, cntry, name_country, nuts_level, NAME_LATN) from ess3.

  • Removed duplicate rows.

Merge with Geo Data:

  • Loaded and selected relevant columns (NUTS_ID, NAME_LATN) from eurostat_geodata_60_2016.

  • Renamed columns and merged with graph1 by nuts2.

  • Converted NAME_LATN_NEW to title case.

Data Cleaning and Transformation:

  • Retained only relevant columns and ensured data uniqueness.

  • Converted nuts_level to numeric and recoded values:

    • Level 3 and 2 were set to 2.

    • Level 1 remained 1.

  • Removed Belgium’s level 2 regions.

Column Renaming:

  • Renamed columns for clarity:

    • nuts2 to “NUTS Region”

    • name_country to “Country”

    • nuts_level to “NUTS level”

    • NAME_LATN_NEW to “Region’s name”

# Select the columns 
graph1 <- dplyr::select(ess3, nuts2, cntry, 
                        name_country, nuts_level, NAME_LATN)

#select the columns 
graph1 <- dplyr::select(ess3, nuts2, cntry, 
                        name_country, nuts_level, NAME_LATN) %>% unique()

geo_data <- eurostat_geodata_60_2016
geo_data <- dplyr::select(geo_data, NUTS_ID, NAME_LATN)

colnames(geo_data)[1] <- "nuts2"
colnames(geo_data)[2] <- "NAME_LATN_NEW"

graph1 <- merge(graph1, geo_data, by = c("nuts2"))
graph1$NAME_LATN_NEW <- tools::toTitleCase(tolower(graph1$NAME_LATN_NEW)) # there is no missing value 

# only keep the relevant data 
graph1 <- graph1[, c("nuts2", "cntry", 
                     "name_country", "nuts_level", "NAME_LATN_NEW")]

graph1 <- unique(graph1)

graph1$nuts_level <- as.character.numeric_version(graph1$nuts_level)

graph1 <- graph1 %>%
  mutate(nuts_level = case_when(
    nuts_level == 3 ~ 2,
    nuts_level == 2 ~ 2,
    nuts_level == 1 ~ 1,
    TRUE ~ NA_real_ 
))

graph1 <- unique(graph1)

# remove belgium nuts level 2, 
graph1 <- graph1 %>%
filter(!(nuts_level %in% c(2) & cntry == "BE"))

colnames(graph1)[1] <- "NUTS Region"

colnames(graph1)[3] <- "Country"

colnames(graph1)[4] <- "NUTS level"

colnames(graph1)[5] <- "Region's name"
graph1
NUTS Region cntry Country NUTS level Region’s name
AT11 AT Austria 2 Burgenland
AT12 AT Austria 2 Niederösterreich
AT13 AT Austria 2 Wien
AT21 AT Austria 2 Kärnten
AT22 AT Austria 2 Steiermark
AT31 AT Austria 2 Oberösterreich
AT32 AT Austria 2 Salzburg
AT33 AT Austria 2 Tirol
AT34 AT Austria 2 Vorarlberg
BG31 BG Bulgaria 2 Severozapaden
BG32 BG Bulgaria 2 Severen Tsentralen
BG33 BG Bulgaria 2 Severoiztochen
BG34 BG Bulgaria 2 Yugoiztochen
BG41 BG Bulgaria 2 Yugozapaden
BG42 BG Bulgaria 2 Yuzhen Tsentralen
CH01 CH Switzerland 2 Région Lémanique
CH02 CH Switzerland 2 Espace Mittelland
CH03 CH Switzerland 2 Nordwestschweiz
CH04 CH Switzerland 2 Zürich
CH05 CH Switzerland 2 Ostschweiz
CH06 CH Switzerland 2 Zentralschweiz
CH07 CH Switzerland 2 Ticino
CZ01 CZ Czech Republic 2 Praha
CZ02 CZ Czech Republic 2 Střední Čechy
CZ03 CZ Czech Republic 2 Jihozápad
CZ04 CZ Czech Republic 2 Severozápad
CZ05 CZ Czech Republic 2 Severovýchod
CZ06 CZ Czech Republic 2 Jihovýchod
CZ07 CZ Czech Republic 2 Střední Morava
CZ08 CZ Czech Republic 2 Moravskoslezsko
DE1 DE Germany 1 Baden-Württemberg
DE2 DE Germany 1 Bayern
DE3 DE Germany 1 Berlin
DE4 DE Germany 1 Brandenburg
DE5 DE Germany 1 Bremen
DE6 DE Germany 1 Hamburg
DE7 DE Germany 1 Hessen
DE8 DE Germany 1 Mecklenburg-Vorpommern
DE9 DE Germany 1 Niedersachsen
DEA DE Germany 1 Nordrhein-Westfalen
DEB DE Germany 1 Rheinland-Pfalz
DEC DE Germany 1 Saarland
DED DE Germany 1 Sachsen
DEE DE Germany 1 Sachsen-Anhalt
DEF DE Germany 1 Schleswig-Holstein
DEG DE Germany 1 Thüringen
DK01 DK Denmark 2 Hovedstaden
DK02 DK Denmark 2 Sjælland
DK03 DK Denmark 2 Syddanmark
DK04 DK Denmark 2 Midtjylland
DK05 DK Denmark 2 Nordjylland
EE00 EE Estonia 2 Eesti
EL30 GR Greece 2 Attiki
EL43 GR Greece 2 Kriti
EL51 GR Greece 2 Anatoliki Makedonia, Thraki
EL52 GR Greece 2 Kentriki Makedonia
EL53 GR Greece 2 Dytiki Makedonia
EL61 GR Greece 2 Thessalia
EL63 GR Greece 2 Dytiki Ellada
EL64 GR Greece 2 Sterea Ellada
EL65 GR Greece 2 Peloponnisos
ES11 ES Spain 2 Galicia
ES12 ES Spain 2 Principado De Asturias
ES13 ES Spain 2 Cantabria
ES21 ES Spain 2 País Vasco
ES22 ES Spain 2 Comunidad Foral De Navarra
ES23 ES Spain 2 La Rioja
ES24 ES Spain 2 Aragón
ES30 ES Spain 2 Comunidad De Madrid
ES41 ES Spain 2 Castilla y León
ES42 ES Spain 2 Castilla-La Mancha
ES43 ES Spain 2 Extremadura
ES51 ES Spain 2 Cataluña
ES52 ES Spain 2 Comunidad Valenciana
ES53 ES Spain 2 Illes Balears
ES61 ES Spain 2 Andalucía
ES62 ES Spain 2 Región De Murcia
ES63 ES Spain 2 Ciudad Autónoma De Ceuta
ES70 ES Spain 2 Canarias
FI19 FI Finland 2 Länsi-Suomi
FI1B FI Finland 2 Helsinki-Uusimaa
FI1C FI Finland 2 Etelä-Suomi
FI1D FI Finland 2 Pohjois- Ja Itä-Suomi
FR10 FR France 2 Ile-De-France
FRB0 FR France 2 Centre - Val De Loire
FRC1 FR France 2 Bourgogne
FRC2 FR France 2 Franche-Comté
FRD1 FR France 2 Basse-Normandie
FRD2 FR France 2 Haute-Normandie
FRE1 FR France 2 Nord-Pas De Calais
FRE2 FR France 2 Picardie
FRF1 FR France 2 Alsace
FRF2 FR France 2 Champagne-Ardenne
FRF3 FR France 2 Lorraine
FRG0 FR France 2 Pays De La Loire
FRH0 FR France 2 Bretagne
FRI1 FR France 2 Aquitaine
FRI2 FR France 2 Limousin
FRI3 FR France 2 Poitou-Charentes
FRJ1 FR France 2 Languedoc-Roussillon
FRJ2 FR France 2 Midi-Pyrénées
FRK1 FR France 2 Auvergne
FRK2 FR France 2 Rhône-Alpes
FRL0 FR France 2 Provence-Alpes-Côte D’azur
HU11 HU Hungary 2 Budapest
HU12 HU Hungary 2 Pest
HU21 HU Hungary 2 Közép-Dunántúl
HU22 HU Hungary 2 Nyugat-Dunántúl
HU23 HU Hungary 2 Dél-Dunántúl
HU31 HU Hungary 2 Észak-Magyarország
HU32 HU Hungary 2 Észak-Alföld
HU33 HU Hungary 2 Dél-Alföld
IE04 IE Ireland 2 Northern and Western
IE05 IE Ireland 2 Southern
IE06 IE Ireland 2 Eastern and Midland
ITC1 IT Italy 2 Piemonte
ITC2 IT Italy 2 Valle D’aosta/Vallée D’aoste
ITC3 IT Italy 2 Liguria
ITC4 IT Italy 2 Lombardia
ITF1 IT Italy 2 Abruzzo
ITF3 IT Italy 2 Campania
ITF4 IT Italy 2 Puglia
ITF5 IT Italy 2 Basilicata
ITF6 IT Italy 2 Calabria
ITG1 IT Italy 2 Sicilia
ITG2 IT Italy 2 Sardegna
ITH1 IT Italy 2 Provincia Autonoma Di Bolzano/Bozen
ITH2 IT Italy 2 Provincia Autonoma Di Trento
ITH3 IT Italy 2 Veneto
ITH4 IT Italy 2 Friuli-Venezia Giulia
ITH5 IT Italy 2 Emilia-Romagna
ITI1 IT Italy 2 Toscana
ITI2 IT Italy 2 Umbria
ITI3 IT Italy 2 Marche
ITI4 IT Italy 2 Lazio
LT01 LT Lithuania 2 Sostinės Regionas
LT02 LT Lithuania 2 Vidurio Ir Vakarų Lietuvos Regionas
LV00 LV Latvia 2 Latvija
NL11 NL Netherlands 2 Groningen
NL12 NL Netherlands 2 Friesland (Nl)
NL13 NL Netherlands 2 Drenthe
NL21 NL Netherlands 2 Overijssel
NL22 NL Netherlands 2 Gelderland
NL23 NL Netherlands 2 Flevoland
NL31 NL Netherlands 2 Utrecht
NL32 NL Netherlands 2 Noord-Holland
NL33 NL Netherlands 2 Zuid-Holland
NL34 NL Netherlands 2 Zeeland
NL41 NL Netherlands 2 Noord-Brabant
NL42 NL Netherlands 2 Limburg (Nl)
NO01 NO Norway 2 Oslo Og Akershus
NO02 NO Norway 2 Hedmark Og Oppland
NO03 NO Norway 2 Sør-Østlandet
NO04 NO Norway 2 Agder Og Rogaland
NO05 NO Norway 2 Vestlandet
NO06 NO Norway 2 Trøndelag
NO07 NO Norway 2 Nord-Norge
PL21 PL Poland 2 Małopolskie
PL22 PL Poland 2 Śląskie
PL41 PL Poland 2 Wielkopolskie
PL42 PL Poland 2 Zachodniopomorskie
PL43 PL Poland 2 Lubuskie
PL51 PL Poland 2 Dolnośląskie
PL52 PL Poland 2 Opolskie
PL61 PL Poland 2 Kujawsko-Pomorskie
PL62 PL Poland 2 Warmińsko-Mazurskie
PL63 PL Poland 2 Pomorskie
PL71 PL Poland 2 Łódzkie
PL72 PL Poland 2 Świętokrzyskie
PL81 PL Poland 2 Lubelskie
PL82 PL Poland 2 Podkarpackie
PL84 PL Poland 2 Podlaskie
PL92 PL Poland 2 Mazowiecki Regionalny
PT11 PT Portugal 2 Norte
PT15 PT Portugal 2 Algarve
PT16 PT Portugal 2 Centro (Pt)
PT17 PT Portugal 2 Área Metropolitana De Lisboa
PT18 PT Portugal 2 Alentejo
RO11 RO Romania 2 Nord-Vest
RO12 RO Romania 2 Centru
RO21 RO Romania 2 Nord-Est
RO22 RO Romania 2 Sud-Est
RO31 RO Romania 2 Sud - Muntenia
RO32 RO Romania 2 Bucureşti - Ilfov
RO41 RO Romania 2 Sud-Vest Oltenia
RO42 RO Romania 2 Vest
SE11 SE Sweden 2 Stockholm
SE12 SE Sweden 2 Östra Mellansverige
SE21 SE Sweden 2 Småland Med Öarna
SE22 SE Sweden 2 Sydsverige
SE23 SE Sweden 2 Västsverige
SE31 SE Sweden 2 Norra Mellansverige
SE32 SE Sweden 2 Mellersta Norrland
SE33 SE Sweden 2 Övre Norrland
SI03 SI Slovenia 2 Vzhodna Slovenija
SI04 SI Slovenia 2 Zahodna Slovenija
SK01 SK Slovakia 2 Bratislavský Kraj
SK02 SK Slovakia 2 Západné Slovensko
SK03 SK Slovakia 2 Stredné Slovensko
SK04 SK Slovakia 2 Východné Slovensko
UKC GB United Kingdom 1 North East (England)
UKD GB United Kingdom 1 North West (England)
UKE GB United Kingdom 1 Yorkshire and the Humber
UKF GB United Kingdom 1 East Midlands (England)
UKG GB United Kingdom 1 West Midlands (England)
UKH GB United Kingdom 1 East of England
UKI GB United Kingdom 1 London
UKJ GB United Kingdom 1 South East (England)
UKK GB United Kingdom 1 South West (England)
UKL GB United Kingdom 1 Wales
UKM GB United Kingdom 1 Scotland
UKN GB United Kingdom 1 Northern Ireland

9.2 Create a Descriptive Summary Statistics for the Entire ESS Dataset

Data Selection:

  • Selected relevant columns from ess3 for analysis.

  • Renamed columns for clarity and consistency.

Column Names:

  • Support for Welfare State: indexwelfare_alesina_sbeqsoc_sbprvpv

  • Gender (Female): gndr_dummyfemale

  • Age: agea

  • Employment Status (Unemployed): unemployed_dummy

  • Place of Residence (Urban Area): urban_dummy

  • Left-Right Political Placement: lrscale

  • National Unemployment Rate (20-64 years): national_unemployment_level

  • Trust in National Parliament: trstprl

  • Regional Unemployment Rate (20-64 years): regional_unemployment

  • Regional Population Density: regional_population_density

  • National Corruption Perception Index: national_corruption_perception

  • National Social Protection Expenditure (% of GDP): national_social_protection

  • Cumulative Net EU Migration (t-4): EU_net_migration_mean_4yr

  • Log of Cumulative EU Immigration (t-4): log_EU_immigration_cumulative_4yr

  • Log of Cumulative EU Emigration (t-4): log_EU_emigration_cumulative_4yr

  • Individual Economic Insecurities: lknemny

  • Regional Economic Disparity: regional_gdp_pps_eur_per_inhabitant_percentage_EU_average

  • Regional Old-Age Dependency Ratio: regional_old_age_dependency

  • Education Level: educ

  • Log of Regional Foreign-Born Population: regional_foreign_born_population_log

Summary Statistics:

  • Generated a summary of descriptive statistics for the selected variables.

Table Generation:

  • Created an HTML table of the summary statistics using knitr::kable() for clear presentation.
# create a new dataset based on ess3
descriptive <- dplyr::select(ess3, indexwelfare_alesina_sbeqsoc_sbprvpv, 
                             EU_net_migration_mean_4yr, 
                             log_EU_immigration_cumulative_4yr,
                             log_EU_emigration_cumulative_4yr,
                             regional_gdp_pps_eur_per_inhabitant_percentage_EU_average, 
                             lknemny,
                             gndr_dummyfemale, agea, lrscale, educ, 
                             unemployed_dummy, urban_dummy, 
                             trstprl, regional_unemployment, 
                             regional_population_density,
                             regional_old_age_dependency, 
                             regional_foreign_born_population_log,
                             national_unemployment_level, 
                             national_social_protection,
                             national_corruption_perception)

colnames(descriptive)[colnames(descriptive) == "indexwelfare_alesina_sbeqsoc_sbprvpv"] ="Support for Welfare State"

colnames(descriptive)[colnames(descriptive) == "gndr_dummyfemale"] ="Gender (Female)"

colnames(descriptive)[colnames(descriptive) == "agea"] ="Age"

colnames(descriptive)[colnames(descriptive) == "unemployed_dummy"] ="Employment Status (Unemployed)"

colnames(descriptive)[colnames(descriptive) == "urban_dummy"] ="Place of Residence (Urban Area)"

colnames(descriptive)[colnames(descriptive) == "lrscale"] ="Left-Right Political Placement"

colnames(descriptive)[colnames(descriptive) == "national_unemployment_level"] ="National Unemployment Rate (20-64 years)"

colnames(descriptive)[colnames(descriptive) == "trstprl"] ="Trust in National Parliament"

colnames(descriptive)[colnames(descriptive) == "regional_unemployment"] ="Regional Unemployment Rate (20-64 years)"

colnames(descriptive)[colnames(descriptive) == "regional_population_density"] ="Regional Population Density"

colnames(descriptive)[colnames(descriptive) == "national_corruption_perception"] ="National Corruption Perception Index"

colnames(descriptive)[colnames(descriptive) == "national_social_protection"] ="National Social Protection Expenditure (% of GDP)"

colnames(descriptive)[colnames(descriptive) == "EU_net_migration_mean_4yr"] ="Cumulative Net EU Migration (t-4)"
colnames(descriptive)[colnames(descriptive) == "log_EU_immigration_cumulative_4yr"] ="Log of Cumulative EU Immigration (t-4)"

colnames(descriptive)[colnames(descriptive) == "log_EU_emigration_cumulative_4yr"] ="Log of Cumulative EU Emigration (t-4)"

colnames(descriptive)[colnames(descriptive) == "lknemny"] ="Individual Economic Insecurities"

colnames(descriptive)[colnames(descriptive) == "regional_gdp_pps_eur_per_inhabitant_percentage_EU_average"] ="Regional Economic Disparities"

colnames(descriptive)[colnames(descriptive) == "regional_old_age_dependency"] ="Regional Old-Age Dependency Ratio"

colnames(descriptive)[colnames(descriptive) == "educ"] ="Education Level"

colnames(descriptive)[colnames(descriptive) == "regional_foreign_born_population_log"] ="Log of Regional Foreign-Born Population"

summary_table_overall <- describe(descriptive)
summary_table_overall <- dplyr::select(summary_table_overall, n, mean, sd, median, min, max)
summary_table_overall <- round(summary_table_overall, 1)
summary_table_overall
n mean sd median min max
Support for Welfare State 60197 0.0 0.6 0.0 -3.2 1.3
Cumulative Net EU Migration (t-4) 60197 121739.6 287612.1 51554.0 -301223.0 1141443.0
Log of Cumulative EU Immigration (t-4) 60197 11.7 1.8 11.8 5.6 14.4
Log of Cumulative EU Emigration (t-4) 60197 11.5 1.5 11.8 7.7 13.9
Regional Economic Disparities 58683 103.5 43.6 98.0 28.0 258.0
Individual Economic Insecurities 56917 2.0 0.9 2.0 1.0 4.0
Gender (Female) 60197 0.5 0.5 1.0 0.0 1.0
Age 60197 48.9 17.9 49.0 15.0 123.0
Left-Right Political Placement 60197 5.1 2.2 5.0 0.0 10.0
Education Level 60197 3.3 1.3 3.0 0.0 5.0
Employment Status (Unemployed) 60197 0.0 0.1 0.0 0.0 1.0
Place of Residence (Urban Area) 60197 0.2 0.4 0.0 0.0 1.0
Trust in National Parliament 60197 4.6 2.5 5.0 0.0 10.0
Regional Unemployment Rate (20-64 years) 60197 6.7 3.8 6.0 1.6 28.5
Regional Population Density 60197 341.7 779.3 121.1 3.3 7454.6
Regional Old-Age Dependency Ratio 60197 29.2 5.5 28.9 14.7 50.2
Log of Regional Foreign-Born Population 60197 11.2 1.5 11.3 6.6 14.7
National Unemployment Rate (20-64 years) 60197 6.7 3.1 6.1 2.0 19.4
National Social Protection Expenditure (% of GDP) 60197 23.6 5.8 23.8 12.0 34.3
National Corruption Perception Index 60197 31.0 15.7 30.0 7.0 64.0

9.3 Create a Descriptive Summary Statistics for the Immigration affected Subset

The code generates and displays summary statistics for selected variables from the ess3_immigration dataset.

  • Data Selection: Uses dplyr::select() to extract relevant columns, including welfare support and immigration metrics.

  • Renaming Columns: Updates column names to descriptive labels for clarity.

  • Summary Statistics: Computes summary statistics with summary().

  • Presentation: Formats and displays the results using knitr::kable() with a central alignment and a descriptive caption.

# create a new dataset based on ess3_immigration
descriptive_immigration <- dplyr::select(ess3_immigration, indexwelfare_alesina_sbeqsoc_sbprvpv,
                                         EU_net_migration_mean_4yr, log_EU_immigration_cumulative_4yr,
                                         log_EU_emigration_cumulative_4yr,
                                         regional_gdp_pps_eur_per_inhabitant_percentage_EU_average,
                                         lknemny,
                                         gndr_dummyfemale, agea, lrscale, educ, unemployed_dummy,
                                         urban_dummy, trstprl, regional_unemployment,
                                         regional_population_density, regional_old_age_dependency,
                                         regional_foreign_born_population_log,
                                         national_unemployment_level,
                                         national_social_protection,
                                         national_corruption_perception)

colnames(descriptive_immigration)[colnames(descriptive_immigration) == "indexwelfare_alesina_sbeqsoc_sbprvpv"] ="Support for Welfare State"

colnames(descriptive_immigration)[colnames(descriptive_immigration) == "gndr_dummyfemale"] ="Gender (Female)"

colnames(descriptive_immigration)[colnames(descriptive_immigration) == "agea"] ="Age"

colnames(descriptive_immigration)[colnames(descriptive_immigration) == "unemployed_dummy"] ="Employment Status (Unemployed)"

colnames(descriptive_immigration)[colnames(descriptive_immigration) == "urban_dummy"] ="Place of Residence (Urban Area)"

colnames(descriptive_immigration)[colnames(descriptive_immigration) == "lrscale"] ="Left-Right Political Placement"

colnames(descriptive_immigration)[colnames(descriptive_immigration) == "trstprl"] ="Trust in National Parliament"

colnames(descriptive_immigration)[colnames(descriptive_immigration) == "national_unemployment_level"] ="National Unemployment Rate (20-64 years)"

colnames(descriptive_immigration)[colnames(descriptive_immigration) == "regional_unemployment"] ="Regional Unemployment Rate (20-64 years)"

colnames(descriptive_immigration)[colnames(descriptive_immigration) == "regional_population_density"] ="Regional Population Density"

colnames(descriptive_immigration)[colnames(descriptive_immigration) == "national_corruption_perception"] ="National Corruption Perception Index"

colnames(descriptive_immigration)[colnames(descriptive_immigration) == "national_social_protection"] ="National Social Protection Expenditure (% of GDP)"

colnames(descriptive_immigration)[colnames(descriptive_immigration) == "EU_net_migration_mean_4yr"] ="Cumulative Net EU Migration (t-4)"

colnames(descriptive_immigration)[colnames(descriptive_immigration) == "log_EU_immigration_cumulative_4yr"] ="Log of Cumulative EU Immigration (t-4)"

colnames(descriptive_immigration)[colnames(descriptive_immigration) == "log_EU_emigration_cumulative_4yr"] ="Log of Cumulative EU Emigration (t-4)"

colnames(descriptive_immigration)[colnames(descriptive_immigration) == "lknemny"] ="Individual Economic Insecurities"

colnames(descriptive_immigration)[colnames(descriptive_immigration) == "regional_gdp_pps_eur_per_inhabitant_percentage_EU_average"] ="Regional Economic Disparities"

colnames(descriptive_immigration)[colnames(descriptive_immigration) == "regional_old_age_dependency"] ="Regional Old-Age Dependency Ratio"

colnames(descriptive_immigration)[colnames(descriptive_immigration) == "educ"] ="Education Level"

colnames(descriptive_immigration)[colnames(descriptive_immigration) == "regional_foreign_born_population_log"] ="Log of Regional Foreign-Born Population"

summary_table_immigration <- describe(descriptive_immigration)
summary_table_immigration <- dplyr::select(summary_table_immigration, n, mean, sd, median, min, max)
summary_table_immigration <- round(summary_table_immigration, 1)
summary_table_immigration
n mean sd median min max
Support for Welfare State 40546 0.0 0.6 0.0 -3.2 1.3
Cumulative Net EU Migration (t-4) 40546 216184.8 301242.7 77647.0 414.0 1141443.0
Log of Cumulative EU Immigration (t-4) 40546 12.3 1.3 12.3 9.6 14.4
Log of Cumulative EU Emigration (t-4) 40546 11.7 1.4 11.9 8.0 13.9
Regional Economic Disparities 39032 119.0 42.7 113.0 39.0 258.0
Individual Economic Insecurities 38091 1.8 0.8 2.0 1.0 4.0
Gender (Female) 40546 0.5 0.5 1.0 0.0 1.0
Age 40546 48.9 17.9 49.0 15.0 123.0
Left-Right Political Placement 40546 5.0 2.1 5.0 0.0 10.0
Education Level 40546 3.4 1.3 3.0 0.0 5.0
Employment Status (Unemployed) 40546 0.0 0.1 0.0 0.0 1.0
Place of Residence (Urban Area) 40546 0.2 0.4 0.0 0.0 1.0
Trust in National Parliament 40546 5.0 2.4 5.0 0.0 10.0
Regional Unemployment Rate (20-64 years) 40546 6.0 3.1 5.1 1.6 16.9
Regional Population Density 40546 408.9 889.1 162.7 3.3 7454.6
Regional Old-Age Dependency Ratio 40546 29.2 5.8 28.7 14.7 46.5
Log of Regional Foreign-Born Population 40546 11.5 1.3 11.4 8.1 14.7
National Unemployment Rate (20-64 years) 40546 5.9 2.3 5.1 2.0 10.7
National Social Protection Expenditure (% of GDP) 40546 25.6 5.0 27.2 15.7 34.3
National Corruption Perception Index 40546 24.4 13.5 21.0 7.0 53.0

9.4 Create a Descriptive Summary Statistics for the Emigration affected Subset

The code generates and displays summary statistics for selected variables from the ess3_emigration dataset.

  • Data Selection: Uses dplyr::select() to extract relevant columns, including welfare support and immigration metrics.

  • Renaming Columns: Updates column names to descriptive labels for clarity.

  • Summary Statistics: Computes summary statistics with summary().

  • Presentation: Formats and displays the results using knitr::kable() with a central alignment and a descriptive caption.

# create a new dataset based on ess3_emigration
descriptive_emigration <- dplyr::select(ess3_emigration, 
                                        indexwelfare_alesina_sbeqsoc_sbprvpv, 
                                        EU_net_migration_mean_4yr, log_EU_immigration_cumulative_4yr,
                                        log_EU_emigration_cumulative_4yr,
                                        regional_gdp_pps_eur_per_inhabitant_percentage_EU_average, 
                                        lknemny, gndr_dummyfemale, agea, lrscale, educ,
                                        unemployed_dummy, urban_dummy, trstprl, regional_unemployment,
                                        regional_population_density, regional_old_age_dependency,
                                        regional_foreign_born_population_log, national_unemployment_level,
                                        national_social_protection,
                                        national_corruption_perception)

colnames(descriptive_emigration)[colnames(descriptive_emigration) == "indexwelfare_alesina_sbeqsoc_sbprvpv"] ="Support for Welfare State"

colnames(descriptive_emigration)[colnames(descriptive_emigration) == "gndr_dummyfemale"] ="Gender (Female)"

colnames(descriptive_emigration)[colnames(descriptive_emigration) == "agea"] ="Age"

colnames(descriptive_emigration)[colnames(descriptive_emigration) == "unemployed_dummy"] ="Employment Status (Unemployed)"

colnames(descriptive_emigration)[colnames(descriptive_emigration) == "urban_dummy"] ="Place of Residence (Urban Area)"

colnames(descriptive_emigration)[colnames(descriptive_emigration) == "lrscale"] ="Left-Right Political Placement"

colnames(descriptive_emigration)[colnames(descriptive_emigration) == "national_unemployment_level"] ="National Unemployment Rate (20-64 years)"

colnames(descriptive_emigration)[colnames(descriptive_emigration) == "regional_unemployment"] ="Regional Unemployment Rate (20-64 years)"

colnames(descriptive_emigration)[colnames(descriptive_emigration) == "regional_population_density"] ="Regional Population Density"

colnames(descriptive_emigration)[colnames(descriptive_emigration) == "national_corruption_perception"] ="National Corruption Perception Index"

colnames(descriptive_emigration)[colnames(descriptive_emigration) == "trstprl"] ="Trust in National Parliament"

colnames(descriptive_emigration)[colnames(descriptive_emigration) == "national_social_protection"] ="National Social Protection Expenditure (% of GDP)"

colnames(descriptive_emigration)[colnames(descriptive_emigration) == "EU_net_migration_mean_4yr"] ="Cumulative Net EU Migration (t-4)"

colnames(descriptive_emigration)[colnames(descriptive_emigration) == "log_EU_immigration_cumulative_4yr"] ="Log of Cumulative EU Immigration (t-4)"

colnames(descriptive_emigration)[colnames(descriptive_emigration) == "log_EU_emigration_cumulative_4yr"] ="Log of Cumulative EU Emigration (t-4)"

colnames(descriptive_emigration)[colnames(descriptive_emigration) == "lknemny"] ="Individual Economic Insecurities"

colnames(descriptive_emigration)[colnames(descriptive_emigration) == "regional_gdp_pps_eur_per_inhabitant_percentage_EU_average"] ="Regional Economic Disparities"

colnames(descriptive_emigration)[colnames(descriptive_emigration) == "regional_old_age_dependency"] ="Regional Old-Age Dependency Ratio"

colnames(descriptive_emigration)[colnames(descriptive_emigration) == "educ"] ="Education Level"

colnames(descriptive_emigration)[colnames(descriptive_emigration) == "regional_foreign_born_population_log"] ="Log of Regional Foreign-Born Population"

summary_table_emigration <- describe(descriptive_emigration)
summary_table_emigration <- dplyr::select(summary_table_emigration, n, mean, sd, median, min, max)
summary_table_emigration <- round(summary_table_emigration, 1)
summary_table_emigration
n mean sd median min max
Support for Welfare State 19651 0.0 0.6 0.1 -3.0 1.3
Cumulative Net EU Migration (t-4) 19651 -73129.6 98904.5 -28463.0 -301223.0 -1590.0
Log of Cumulative EU Immigration (t-4) 19651 10.4 2.0 10.4 5.6 13.1
Log of Cumulative EU Emigration (t-4) 19651 11.1 1.7 11.6 7.7 13.6
Regional Economic Disparities 19651 72.7 25.1 69.0 28.0 156.0
Individual Economic Insecurities 18826 2.3 0.9 2.0 1.0 4.0
Gender (Female) 19651 0.5 0.5 1.0 0.0 1.0
Age 19651 49.1 18.0 49.0 15.0 96.0
Left-Right Political Placement 19651 5.2 2.4 5.0 0.0 10.0
Education Level 19651 3.1 1.3 3.0 0.0 5.0
Employment Status (Unemployed) 19651 0.0 0.1 0.0 0.0 1.0
Place of Residence (Urban Area) 19651 0.3 0.4 0.0 0.0 1.0
Trust in National Parliament 19651 3.7 2.5 4.0 0.0 10.0
Regional Unemployment Rate (20-64 years) 19651 8.3 4.6 7.2 2.5 28.5
Regional Population Density 19651 203.2 448.1 87.5 23.2 4242.2
Regional Old-Age Dependency Ratio 19651 29.4 5.0 28.9 18.1 50.2
Log of Regional Foreign-Born Population 19651 10.7 1.7 10.8 6.6 13.9
National Unemployment Rate (20-64 years) 19651 8.2 3.8 7.1 4.4 19.4
National Social Protection Expenditure (% of GDP) 19651 19.4 4.9 19.4 12.0 29.2
National Corruption Perception Index 19651 44.7 10.2 41.0 30.0 64.0

9.5 Create a Table for an Overview of the Countries Impacted by EU Immigration and Emigration

The resulting dataset provides a clear overview of the countries, years, and whether they were impacted by EU immigration or emigration.

eu_impact <- rbind(ess3_emigration, ess3_immigration)

eu_impact <- dplyr::select(eu_impact, name_country, ess_year, 
                           immigration_affected, emigration_affected) %>% distinct()

eu_impact <- eu_impact %>%
  dplyr::mutate(emigration_affected = ifelse(emigration_affected == 1, "x", emigration_affected))

eu_impact <- eu_impact %>%
  dplyr::mutate(immigration_affected = ifelse(immigration_affected == 1, "x", immigration_affected))

colnames(eu_impact)[colnames(eu_impact) == "name_country"] ="Country"
colnames(eu_impact)[colnames(eu_impact) == "ess_year"] ="Year"
colnames(eu_impact)[colnames(eu_impact) == "immigration_affected"] ="Affected by EU Immigration"
colnames(eu_impact)[colnames(eu_impact) == "emigration_affected"] ="Affected by EU Emigration"

eu_impact
Country Year Affected by EU Immigration Affected by EU Emigration
Bulgaria 2008 0 x
Estonia 2008 0 x
Estonia 2016 0 x
Spain 2016 0 x
Hungary 2016 0 x
Italy 2016 0 x
Lithuania 2016 0 x
Latvia 2008 0 x
Poland 2008 0 x
Poland 2016 0 x
Portugal 2008 0 x
Portugal 2016 0 x
Romania 2008 0 x
Slovenia 2008 0 x
Slovenia 2016 0 x
Austria 2016 x 0
Belgium 2016 x 0
Switzerland 2008 x 0
Switzerland 2016 x 0
Czech Republic 2008 x 0
Czech Republic 2016 x 0
Germany 2008 x 0
Germany 2016 x 0
Denmark 2008 x 0
Greece 2008 x 0
Spain 2008 x 0
Finland 2016 x 0
France 2016 x 0
Hungary 2008 x 0
Ireland 2016 x 0
Netherlands 2016 x 0
Netherlands 2008 x 0
Norway 2008 x 0
Norway 2016 x 0
Sweden 2008 x 0
Sweden 2016 x 0
Slovakia 2008 x 0
United Kingdom 2008 x 0
United Kingdom 2016 x 0

9.6 Create a Cross-Correlation Table

Data Preparation:

  • Created a dataframe cross_correlation with selected z-scores from ess3.

  • Removed any rows with missing values using na.omit().

Correlation Calculation:

  • Calculated the correlation matrix of the variables.

  • Rounded the correlations to two decimal places.

Table Formatting:

  • Converted the correlation matrix to a dataframe cor_matrix_df.

  • Added variable names and reordered columns for readability.

  • Updated variable names with descriptive labels:

    • “V1: favours reduction in income differences”

    • “V2: favours government responsibility for the standard of living for the old”

    • “V3: favours government responsibility for the standard of living of the unemployed”

    • “V4: favours government responsibility for child care services”

    • “V5: Agrees that social benefits lead to a more equal society”

    • “V6: Agrees that social benefits prevent widespread poverty”

Table Generation:

  • Generated an HTML table of the cross-correlation matrix using kable() with styling options for enhanced readability.
# create the data frame based on the welfare support items (z-transformed)
cross_correlation <- data.frame(
  gincdif_rescaled_z_score = ess3$gincdif_rescaled_z_score, 
  gvslvol_z_score = ess3$gvslvol_z_score, 
  gvslvue_z_score = ess3$gvslvue_z_score, 
  gvcldcr_z_score = ess3$gvcldcr_z_score, 
  sbeqsoc_rescaled_z_score = ess3$sbeqsoc_rescaled_z_score, 
  sbprvpv_rescaled_z_score = ess3$sbprvpv_rescaled_z_score
)

# remove missing values 
cross_correlation <- na.omit(cross_correlation)

cor_matrix <- cor(cross_correlation)
cor_matrix_rounded <- round(cor_matrix, 2)


cor_matrix_df <- as.data.frame(cor_matrix_rounded)
cor_matrix_df$Variables <- rownames(cor_matrix_df)
rownames(cor_matrix_df) <- NULL

cor_matrix_df <- cor_matrix_df[c("Variables", 
                                 colnames(cor_matrix_df)[-which(names(cor_matrix_df) == "Variables")])]


cor_matrix_df$Variables <- c("V1: Favours reduction in income differences",
                              "V2: Favours government responsibility for the standard of living for the old",
                              "V3: Favours government responsibility for the standard of living of the unemployed",
                              "V4: Favours government responsibility for child care services",
                              "V5: Agrees that social benefits lead to a more equal society",
                              "V6: Agrees that social benefits prevent widespread poverty")

colnames(cor_matrix_df)[-1] <- paste0("V", 1:(ncol(cor_matrix_df) - 1))


# Changing row descriptions to more descriptive labels
library(kableExtra)

cor_table <- kable(cor_matrix_df, format = "html", table.attr = "style='width:70%;'", align = "c", caption = "Cross-correlations of Welfare Attitudes") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) 

# display the results 
cor_table
Cross-correlations of Welfare Attitudes
Variables V1 V2 V3 V4 V5 V6
V1: Favours reduction in income differences 1.00 0.22 0.23 0.20 0.04 -0.01
V2: Favours government responsibility for the standard of living for the old 0.22 1.00 0.48 0.46 -0.01 -0.03
V3: Favours government responsibility for the standard of living of the unemployed 0.23 0.48 1.00 0.42 0.10 0.04
V4: Favours government responsibility for child care services 0.20 0.46 0.42 1.00 0.01 -0.01
V5: Agrees that social benefits lead to a more equal society 0.04 -0.01 0.10 0.01 1.00 0.50
V6: Agrees that social benefits prevent widespread poverty -0.01 -0.03 0.04 -0.01 0.50 1.00

9.7 Cronbach’s Alpha

Data Preparation:

  • Created a dataframe cronbach with selected z-scores from ess3.

  • Removed rows with missing values using na.omit().

Cronbach’s Alpha Calculation:

  • Calculated Cronbach’s Alpha using the psych package’s alpha() function.

  • Printed the Cronbach’s Alpha value.

Alternative Calculation:

  • Calculated Cronbach’s Alpha using the ltm package’s cronbach.alpha() function.

  • Printed the Cronbach’s Alpha value.

# create a data frame based on the welfare support items (z-transformed)
cronbach <- data.frame(
  gincdif_rescaled_z_score = ess3$gincdif_rescaled_z_score, 
  gvslvol_z_score = ess3$gvslvol_z_score, 
  gvslvue_z_score = ess3$gvslvue_z_score, 
  gvcldcr_z_score = ess3$gvcldcr_z_score, 
  sbeqsoc_rescaled_z_score = ess3$sbeqsoc_rescaled_z_score, 
  sbprvpv_rescaled_z_score = ess3$sbprvpv_rescaled_z_score
)

# remove missing values 
cronbach <- na.omit(cronbach)

# calculate cronbach alpha 
cronbach_alpha <- psych::alpha(cronbach)

# display the results 
kable(cronbach_alpha$alpha, format = "markdown", caption = "Cronbach's Alpha")
Cronbach’s Alpha
raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
gincdif_rescaled_z_score 0.5474620 0.5468078 0.5921174 0.1944019 1.2065691 0.0029994 0.0552846 0.0656760
gvslvol_z_score 0.4704654 0.4705339 0.4981834 0.1509154 0.8886949 0.0034754 0.0341305 0.0668684
gvslvue_z_score 0.4417237 0.4412168 0.4846270 0.1363829 0.7896028 0.0036768 0.0406485 0.0227064
gvcldcr_z_score 0.4794258 0.4793742 0.5175980 0.1555146 0.9207655 0.0034245 0.0394425 0.0668684
sbeqsoc_rescaled_z_score 0.5549542 0.5533869 0.5485753 0.1985991 1.2390743 0.0028190 0.0398074 0.2084320
sbprvpv_rescaled_z_score 0.5763493 0.5752176 0.5646846 0.2131122 1.3541463 0.0026962 0.0341230 0.2084320

9.8 Create a Graph for Welfare State Support by NUTS Regions

This code generates a map displaying the average level of public support for the welfare state across various NUTS regions. For this purpose, the welfare index is used. Regions with lower levels of support are highlighted in red, while those with higher levels of support are shown in green.

# create a new dataset based on ess3
graph7 <- dplyr::select(ess3, nuts2, ess_year, 
                        indexwelfare_alesina_sbeqsoc_sbprvpv, 
                        geometry)

# remove missing values 
graph7 <- graph7 %>%
  drop_na(nuts2)  %>% 
  drop_na(indexwelfare_alesina_sbeqsoc_sbprvpv)

graph7 <- graph7 %>%
  group_by(nuts2, geometry) %>%
  summarize(mean_value = mean(indexwelfare_alesina_sbeqsoc_sbprvpv, 
                              na.rm = TRUE))

graph7_sf <- st_as_sf(graph7)

graph7_sf$mean_value <- round(graph7_sf$mean_value, 2)

# Calculate breaks based on the range of mean_value
breaks <- seq(min(graph7_sf$mean_value, na.rm = TRUE), max(graph7_sf$mean_value, 
                                                           na.rm = TRUE), length.out = 10)

# specify the labels 
labels <- c(
  "Very Low Support (-1.030 to -0.861)",
  "Low Support (-0.861 to -0.692)",
  "Moderately Low Support (-0.692 to -0.523)",
  "Slightly Low Support (-0.523 to -0.354)",
  "Below Average Support (-0.354 to -0.185)",
  "Near Average Support (-0.185 to -0.016)",
  "Slightly Above Average Support (-0.016 to 0.153)",
  "Moderate Support (0.153 to 0.322)",
  "Moderately High Support (0.322 to 0.491)",
  "High Support (0.491 to 0.660)"
)

# specify custom colours 
custom_colors <- c(
  "#de425b", "#e76b77", "#ee8e94", "#f2afb2", 
  "#f3d0d1", "#f1f1f1", "#d1e1dc", 
  "#aecdc2", "#6aaa96", "#3e8a6d"
)

# create the plot 
plot7 <- ggplot(data = graph7_sf) +
  geom_sf(aes(fill = mean_value)) +
  geom_sf(aes(geometry = geometry), color = "grey", fill = NA, size = 0.5) + 
  scale_fill_gradientn(colors = custom_colors, 
                       name = "Support for the Welfare State", 
                       breaks = breaks, 
                       labels = labels)  +
  labs(
    title = "Average Support for the Welfare State by NUTS regions"
  ) +
  theme_minimal() +
   theme(
    axis.text = element_blank(),  
    axis.title = element_blank(), 
    legend.text = element_text(size = 9.5),  
    legend.title = element_text(size = 12),  
    legend.position = "right",
    plot.title = element_text(face = "bold", size = 14
  )) +
  theme(
    panel.grid.major = element_blank(),  
    panel.grid.minor = element_blank())  +
  guides(
    fill = guide_legend(keywidth = 1.25, keyheight = 1.25)  
  )

# display the plot 
plot7 

9.9 Factor Analysis and Item-Total Correlations

## Item total Correlations ##
# Calculate item-total correlations
item_total_correlation <- sapply(ess3[, c("gvcldcr_z_score", 
                                          "gvslvue_z_score", 
                                          "gvslvol_z_score",
                                          "gincdif_rescaled_z_score",
                                          "sbeqsoc_rescaled_z_score",
                                          "sbprvpv_rescaled_z_score")], 
                                 function(item) cor(item,
                                          ess3$indexwelfare_alesina_sbeqsoc_sbprvpv,
                                          use = "pairwise.complete.obs"))

# load the tibble package  
library(tibble)

# Convert to a data frame
item_total_correlation_df <- tibble(
  Item = names(item_total_correlation),
  Correlation = item_total_correlation
)

item_total_correlation_df <- item_total_correlation_df %>%
  arrange(desc(Correlation)) 


item_total_correlation_df$Item <- c("V4: Favours government responsibility for child care services",
                             "V3: Favours government responsibility for the standard of living of the unemployed", 
                             "V2: Favours government responsibility for the standard of living for the old",
                             "V1: Favours reduction in income differences",
                             "V6: Agrees that social benefits prevent widespread poverty",
                             "V5: Agrees that social benefits lead to a more equal society")


item_total_correlation_df <- kable(item_total_correlation_df, 
                                   format = "html", 
                                   table.attr = "style='width:70%;'", 
                                   align = "c", 
                                   caption = "Item-total Correlations for Welfare State Support Index") %>%
  kable_styling(bootstrap_options = c("striped", "hover", 
                                      "condensed", "responsive")) 

# display the results 
item_total_correlation_df
Item-total Correlations for Welfare State Support Index
Item Correlation
V4: Favours government responsibility for child care services 0.6740552
V3: Favours government responsibility for the standard of living of the unemployed 0.6336937
V2: Favours government responsibility for the standard of living for the old 0.6190395
V1: Favours reduction in income differences 0.5010095
V6: Agrees that social benefits prevent widespread poverty 0.4844576
V5: Agrees that social benefits lead to a more equal society 0.4409925
## Factor Analysis ##
library(psych)

# Conduct exploratory factor analysis
fa_result <- fa(ess3[, c("gvcldcr_z_score", "gvslvue_z_score", 
                         "gvslvol_z_score",
                         "gincdif_rescaled_z_score", 
                         "sbeqsoc_rescaled_z_score", 
                         "sbprvpv_rescaled_z_score")], 
                nfactors = 1, rotate = "none")

# Print the results
loadings_df <- data.frame(
  Variable = c("V4: Favours government responsibility for child care services",
               "V3: Favours government responsibility for the standard of living of the unemployed",
               "V2: Favours government responsibility for the standard of living for the old",
               "V1: Favours reduction in income differences",
               "V6: Agrees that social benefits prevent widespread poverty",
               "V5: Agrees that social benefits lead to a more equal society"),
  Loading = round(fa_result$loadings[, 1], 3)  
)

loadings_df <- loadings_df %>%
  arrange(desc(Loading)) 

# Print a nicely formatted table using kable
kable(loadings_df, caption = "Factor Loadings for Welfare Index Items", 
      col.names = c("Item", "Factor Loading"))
Factor Loadings for Welfare Index Items
Item Factor Loading
gvslvol_z_score V2: Favours government responsibility for the standard of living for the old 0.705
gvslvue_z_score V3: Favours government responsibility for the standard of living of the unemployed 0.681
gvcldcr_z_score V4: Favours government responsibility for child care services 0.627
gincdif_rescaled_z_score V1: Favours reduction in income differences 0.323
sbeqsoc_rescaled_z_score V6: Agrees that social benefits prevent widespread poverty 0.057
sbprvpv_rescaled_z_score V5: Agrees that social benefits lead to a more equal society 0.016

9.10 Create a Graph for Annual Growth in Stocks of EU Movers of Working Age

The data can be obtained from the following source: https://ec.europa.eu/eurostat/databrowser/view/lfst_lmbpcita__custom_13154924/default/table?lang=en

This code generates a graph that illustrates the annual growth trends in the stocks of EU movers who are of working age. The visualisation highlights changes over time and provides insights into patterns or shifts.

eu_movers <- read_csv("lfst_lmbpcita_page_linear.csv")

eu_movers <- dplyr::select(eu_movers, TIME_PERIOD, OBS_VALUE)


ggplot(eu_movers, aes(x = TIME_PERIOD, y = OBS_VALUE)) +
  geom_line(color = "#449777", size = 1) +     # Plot the line
  geom_point(color = "#449777", size = 2) +    # Plot points on the line
  labs(title = "Stocks of EU Movers of Working Age (20-64)", 
       x = "Year", 
       caption = "Source: Eurostat, variable lfst_lmbpcita",
       y = "Total Stock (in thousands)") +
  scale_x_continuous(breaks = seq(min(eu_movers$TIME_PERIOD), 
                                  max(eu_movers$TIME_PERIOD), by = 1)) +  # Show all years
  scale_y_continuous(
    limits = c(4000, 11000),                   
    breaks = seq(4000, 11000, by = 1000),       
    labels = function(x) format(x, big.mark = "'", scientific = FALSE)  
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    panel.grid.minor.x = element_blank(),  
    legend.title = element_blank()  
  )

years <- 2008:2018
LFS_20_64 <- c(7.1, 3.2, 1.0, 3.8, 4.0, 4.0, 4.1, 5.3, 5.0, 5.6, 1.5)  # now has 11 elements

# Combine into a data frame
data <- data.frame(
  Year = years,
  LFS_20_64 = LFS_20_64
)

library(ggplot2)

ggplot(data, aes(x = Year)) +
  geom_line(aes(y = LFS_20_64, 
                color = "Stock of EU Movers of Working Age"), 
            size = 1) +
  scale_color_manual(
    values = c("Stock of EU Movers of Working Age" = "#449777")  
  ) +
  scale_x_continuous(breaks = seq(min(data$Year), 
                                  max(data$Year), by = 1)) +  
  scale_y_continuous(limits = c(0, 8), 
                     labels = percent_format(scale = 1/1)) +
  labs(
    y = "Annual Growth (in %)",
    title = "Annual Growth in Stocks of EU Movers of Working Age (20-64)",
    caption = "Source: Eurostat, variable lfst_lmbpcita (2024)"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    panel.grid.minor.x = element_blank(),  
    legend.title = element_blank()  
  )

9.11 Create Graphs for the Impact of Intra-EU Migration

This code processes immigration and emigration data for multiple countries in the EU from 2002 to 2023. It merges various datasets for different periods (2002-2003, 2004-2006, 2007-2012, 2013-2019, and 2020-2023) into two main panels: one for immigration and another for emigration. The code also adjusts missing or incorrect values for specific countries (e.g., Switzerland and Poland) using external sources. The resulting datasets are structured to track immigration and emigration flows by country and year, ensuring consistent country naming conventions and filling missing data where necessary. Finally, the plots are generated using the ggplot2 package.

# Create a panel data frame 
set.seed(123) 
ess_year <- 2002:2022
cntry <- c("AT", "BE", "BG", "CH", "CY", "CZ", "DE", "DK", "EE", "ES", 
            "FI", "FR", "GB", "GR", "HR", "HU", "IE", "IS", "IT", "LT", 
            "LU", "LV", "NL", "NO", "PL", "PT", "RO", "SE", "SI", "SK")

panel_immigration <- data.frame(ess_year = rep(ess_year, length(cntry)), 
                                cntry = rep(cntry, each = length(ess_year)))

# Immigration from 2020 to 2023 

immigration_flows_2020_2023 <- read_csv("Immigration_flows_2020_2023.csv")

immigration_flows_2020_2023 <- dplyr::select(immigration_flows_2020_2023,
                                             TIME_PERIOD, 
                                             geo, OBS_VALUE) 

immigration_flows_2020_2023 <- immigration_flows_2020_2023 %>%
  dplyr::rename(EU_immigration_flow = OBS_VALUE)


immigration_flows_2020_2023 <- immigration_flows_2020_2023 %>%
  dplyr::rename(cntry = geo)

immigration_flows_2020_2023 <- immigration_flows_2020_2023 %>%
  dplyr::rename(ess_year = TIME_PERIOD)

# Change UK to GB 
immigration_flows_2020_2023 <- immigration_flows_2020_2023 %>%
  mutate(cntry = ifelse(cntry == "UK", "GB", cntry))

# Change EL to GR 
immigration_flows_2020_2023 <- immigration_flows_2020_2023 %>%
  mutate(cntry = ifelse(cntry == "EL", "GR", 
                        cntry))

## 2013 to 2019 
immigration_flows_2013_2019 <- read_csv("Immigration_flows_2013_2019.csv")

immigration_flows_2013_2019 <- dplyr::select(immigration_flows_2013_2019,
                                             TIME_PERIOD, geo, OBS_VALUE) 

immigration_flows_2013_2019 <- immigration_flows_2013_2019 %>%
  dplyr::rename(EU_immigration_flow = OBS_VALUE)


immigration_flows_2013_2019 <- immigration_flows_2013_2019 %>%
  dplyr::rename(cntry = geo)

immigration_flows_2013_2019 <- immigration_flows_2013_2019 %>%
  dplyr::rename(ess_year = TIME_PERIOD)

# Change UK to GB 
immigration_flows_2013_2019 <- immigration_flows_2013_2019 %>%
  mutate(cntry = ifelse(cntry == "UK", "GB", cntry))

# Change EL to GR 
immigration_flows_2013_2019 <- immigration_flows_2013_2019 %>%
  mutate(cntry = ifelse(cntry == "EL", "GR", cntry))

## 2007 to 2012 
immigration_flows_2007_2012 <- read_csv("Immigration_flows_2007_2012.csv")
immigration_flows_2007_2012 <- dplyr::select(immigration_flows_2007_2012,
                                             TIME_PERIOD, 
                                             geo, OBS_VALUE) 

immigration_flows_2007_2012 <- immigration_flows_2007_2012 %>%
  dplyr::rename(EU_immigration_flow = OBS_VALUE)


immigration_flows_2007_2012 <- immigration_flows_2007_2012 %>%
  dplyr::rename(cntry = geo)

immigration_flows_2007_2012 <- immigration_flows_2007_2012 %>%
  dplyr::rename(ess_year = TIME_PERIOD)

# Change UK to GB 
immigration_flows_2007_2012 <- immigration_flows_2007_2012 %>%
  mutate(cntry = ifelse(cntry == "UK", "GB", cntry))

# Change EL to GR 
immigration_flows_2007_2012 <- immigration_flows_2007_2012 %>%
  mutate(cntry = ifelse(cntry == "EL", "GR", cntry))

## 2004 to 2006 
immigration_flows_2004_2006 <- read_csv("Immigration_flows_2004_2006.csv")

immigration_flows_2004_2006 <- dplyr::select(immigration_flows_2004_2006,
                                             TIME_PERIOD, 
                                             geo, OBS_VALUE) 

immigration_flows_2004_2006 <- immigration_flows_2004_2006 %>%
  dplyr::rename(EU_immigration_flow = OBS_VALUE)


immigration_flows_2004_2006 <- immigration_flows_2004_2006 %>%
  dplyr::rename(cntry = geo)

immigration_flows_2004_2006 <- immigration_flows_2004_2006 %>%
  dplyr::rename(ess_year = TIME_PERIOD)

# Change UK to GB 
immigration_flows_2004_2006 <- immigration_flows_2004_2006 %>%
  mutate(cntry = ifelse(cntry == "UK", "GB", cntry))

# Change EL to GR 
immigration_flows_2004_2006 <- immigration_flows_2004_2006 %>%
  mutate(cntry = ifelse(cntry == "EL", "GR", cntry))

## 2002 to 2003 
immigration_flows_2002_2003 <- read_csv("Immigration_flows_2002_2003.csv")

immigration_flows_2002_2003 <- dplyr::select(immigration_flows_2002_2003,
                                             TIME_PERIOD, 
                                             geo, OBS_VALUE) 

immigration_flows_2002_2003 <- immigration_flows_2002_2003 %>%
  dplyr::rename(EU_immigration_flow = OBS_VALUE)


immigration_flows_2002_2003 <- immigration_flows_2002_2003 %>%
  dplyr::rename(cntry = geo)

immigration_flows_2002_2003 <- immigration_flows_2002_2003 %>%
  dplyr::rename(ess_year = TIME_PERIOD)

# Change UK to GB 
immigration_flows_2002_2003 <- immigration_flows_2002_2003 %>%
  mutate(cntry = ifelse(cntry == "UK", "GB", cntry))

# Change EL to GR 
immigration_flows_2002_2003 <- immigration_flows_2002_2003 %>%
  mutate(cntry = ifelse(cntry == "EL", "GR", cntry))

## Merge everything together 

immigration <-rbind(immigration_flows_2020_2023, 
                    immigration_flows_2013_2019,
                    immigration_flows_2007_2012, 
                    immigration_flows_2004_2006,
                    immigration_flows_2002_2003)

panel_immigration <- merge(panel_immigration, immigration, by = c("cntry", "ess_year"), 
                           all.x = TRUE)

# I do have some missing values that I need to adjust-> for Switzerland I will take the immigration by citizenship: https://ec.europa.eu/eurostat/databrowser/view/migr_imm1ctz__custom_12179741/default/table?lang=en
panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2004 & cntry == "CH", 58103,
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2005 & cntry == "CH", 58954,
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2006 & cntry == "CH", 66003,
                                       EU_immigration_flow))


panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2008 & cntry == "CH", 113575,
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2009 & cntry == "CH", 91138,
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2010 & cntry == "CH", 91208,
                                       EU_immigration_flow))

# Poland 2008: https://stat.gov.pl/en/topics/population/internationa-migration/main-directions-of-emigration-and-immigration-in-the-years-1966-2020-migration-for-permanent-residence,2,2.html
panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2008 & cntry == "PL", 10834,
                                       EU_immigration_flow))


remove(immigration_flows_2002_2003,
       immigration_flows_2004_2006,
       immigration_flows_2007_2012,
       immigration_flows_2013_2019,
       immigration_flows_2020_2023)

# I do have some missing values and for this reason I will utilise the OECD international migration database: https://data-explorer.oecd.org/vis?fs[0]=Topic%2C1%7CSociety%23SOC%23%7CMigration%23SOC_MIG%23&pg=0&fc=Topic&bp=true&snb=3&vw=tb&df[ds]=dsDisseminateFinalDMZ&df[id]=DSD_MIG%40DF_MIG&df[ag]=OECD.ELS.IMD&df[vs]=1.0&dq=.EU15.A.B11._T...&pd=2002%2C&to[TIME_PERIOD]=false

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2003 & cntry == "AT", 17188,
                                       EU_immigration_flow))


panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2002 & cntry == "BE", 30225,
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2002 & cntry == "CH", 49302,
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2002 & cntry == "IE", 15500,
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2002 & cntry == "LU", 8200,
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2002 & cntry == "PT", 4301,
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2003 & cntry == "BE", 30457,
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2003 & cntry == "CH", 49751,
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2003 & cntry == "DE", 98709,
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2003 & cntry == "ES", 69924,
                                       EU_immigration_flow))


panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2003 & cntry == "HU", 1527,
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2003 & cntry == "IE", 17900,
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2003 & cntry == "LU", 9182,
                                       EU_immigration_flow))

panel_immigration <- panel_immigration %>%
  mutate(EU_immigration_flow = if_else(ess_year == 2003 & cntry == "PT", 3843,
                                       EU_immigration_flow))

#### Emigration #### 
# Create a panel data frame 
set.seed(123) 
ess_year <- 2002:2022
cntry <- c("AT", "BE", "BG", "CH", "CY", "CZ", "DE", "DK", "EE", "ES", 
            "FI", "FR", "GB", "GR", "HR", "HU", "IE", "IS", "IT", "LT", 
            "LU", "LV", "NL", "NO", "PL", "PT", "RO", "SE", "SI", "SK")

panel_emigration <- data.frame(ess_year = rep(ess_year, 
                                              length(cntry)), 
                               cntry = rep(cntry, 
                                           each = length(ess_year)))

# From 2020 to 2023 

emigration_flows_2020_2023 <- read_csv("Emigration_flows_2020_2023.csv")

emigration_flows_2020_2023 <- dplyr::select(emigration_flows_2020_2023,
                                            TIME_PERIOD, 
                                            geo, OBS_VALUE) 

emigration_flows_2020_2023 <- emigration_flows_2020_2023 %>%
  dplyr::rename(EU_emigration_flow = OBS_VALUE)


emigration_flows_2020_2023 <- emigration_flows_2020_2023 %>%
  dplyr::rename(cntry = geo)

emigration_flows_2020_2023 <- emigration_flows_2020_2023 %>%
  dplyr::rename(ess_year = TIME_PERIOD)

# Change UK to GB 
emigration_flows_2020_2023 <- emigration_flows_2020_2023 %>%
  mutate(cntry = ifelse(cntry == "UK", "GB", cntry))

# Change EL to GR 
emigration_flows_2020_2023 <- emigration_flows_2020_2023 %>%
  mutate(cntry = ifelse(cntry == "EL", "GR", cntry))

# From 2013 to 2019 
emigration_flows_2013_2019 <- read_csv("Emigration_flows_2013_2019.csv")

emigration_flows_2013_2019 <- dplyr::select(emigration_flows_2013_2019,
                                            TIME_PERIOD, 
                                            geo, OBS_VALUE) 

emigration_flows_2013_2019 <- emigration_flows_2013_2019 %>%
  dplyr::rename(EU_emigration_flow = OBS_VALUE)


emigration_flows_2013_2019 <- emigration_flows_2013_2019 %>%
  dplyr::rename(cntry = geo)

emigration_flows_2013_2019 <- emigration_flows_2013_2019 %>%
  dplyr::rename(ess_year = TIME_PERIOD)

# Change UK to GB 
emigration_flows_2013_2019 <- emigration_flows_2013_2019 %>%
  mutate(cntry = ifelse(cntry == "UK", "GB", cntry))

# Change EL to GR 
emigration_flows_2013_2019 <- emigration_flows_2013_2019 %>%
  mutate(cntry = ifelse(cntry == "EL", "GR", cntry))

# From 2007 to 2012

emigration_flows_2007_2012 <- read_csv("Emigration_flows_2007_2012.csv")

emigration_flows_2007_2012 <- dplyr::select(emigration_flows_2007_2012,
                                            TIME_PERIOD, geo, OBS_VALUE) 

emigration_flows_2007_2012 <- emigration_flows_2007_2012 %>%
  dplyr::rename(EU_emigration_flow = OBS_VALUE)


emigration_flows_2007_2012 <- emigration_flows_2007_2012 %>%
  dplyr::rename(cntry = geo)

emigration_flows_2007_2012 <- emigration_flows_2007_2012 %>%
  dplyr::rename(ess_year = TIME_PERIOD)

# Change UK to GB 
emigration_flows_2007_2012 <- emigration_flows_2007_2012 %>%
  mutate(cntry = ifelse(cntry == "UK", "GB", cntry))

# Change EL to GR 
emigration_flows_2007_2012 <- emigration_flows_2007_2012 %>%
  mutate(cntry = ifelse(cntry == "EL", "GR", cntry))

## From 2004 to 2006 
emigration_flows_2004_2006 <- read_csv("Emigration_flows_2004_2006.csv")

emigration_flows_2004_2006 <- dplyr::select(emigration_flows_2004_2006,
                                            TIME_PERIOD, geo, OBS_VALUE) 

emigration_flows_2004_2006 <- emigration_flows_2004_2006 %>%
  dplyr::rename(EU_emigration_flow = OBS_VALUE)


emigration_flows_2004_2006 <- emigration_flows_2004_2006 %>%
  dplyr::rename(cntry = geo)

emigration_flows_2004_2006 <- emigration_flows_2004_2006 %>%
  dplyr::rename(ess_year = TIME_PERIOD)

# Change UK to GB 
emigration_flows_2004_2006 <- emigration_flows_2004_2006 %>%
  mutate(cntry = ifelse(cntry == "UK", "GB", cntry))

# Change EL to GR 
emigration_flows_2004_2006 <- emigration_flows_2004_2006 %>%
  mutate(cntry = ifelse(cntry == "EL", "GR", cntry))

# emigration flows from 2002 to 2003 
emigration_flows_2002_2003 <- read_csv("Emigration_flows_2002_2003.csv")

emigration_flows_2002_2003 <- dplyr::select(emigration_flows_2002_2003,
                                            TIME_PERIOD, geo, OBS_VALUE) 

emigration_flows_2002_2003 <- emigration_flows_2002_2003 %>%
  dplyr::rename(EU_emigration_flow = OBS_VALUE)


emigration_flows_2002_2003 <- emigration_flows_2002_2003 %>%
  dplyr::rename(cntry = geo)

emigration_flows_2002_2003 <- emigration_flows_2002_2003 %>%
  dplyr::rename(ess_year = TIME_PERIOD)

# Change UK to GB 
emigration_flows_2002_2003 <- emigration_flows_2002_2003 %>%
  mutate(cntry = ifelse(cntry == "UK", "GB", cntry))

# Change EL to GR 
emigration_flows_2002_2003 <- emigration_flows_2002_2003 %>%
  mutate(cntry = ifelse(cntry == "EL", "GR", cntry))


## Merge everything together 

emigration <-rbind(emigration_flows_2020_2023, emigration_flows_2013_2019,
                   emigration_flows_2007_2012, emigration_flows_2004_2006,
                   emigration_flows_2002_2003)

panel_emigration <- merge(panel_emigration, emigration, by = c("cntry", "ess_year"), 
                          all.x = TRUE)

# I do have some missing values that I will adjust 

# Poland 2008: https://stat.gov.pl/en/topics/population/internationa-migration/main-directions-of-emigration-and-immigration-in-the-years-1966-2020-migration-for-permanent-residence,2,2.html
panel_emigration <- panel_emigration %>%
  mutate(EU_emigration_flow = if_else(ess_year == 2008 & cntry == "PL", 24946,
                                      EU_emigration_flow))

# Poland 2006: https://stat.gov.pl/en/topics/population/internationa-migration/main-directions-of-emigration-and-immigration-in-the-years-1966-2020-migration-for-permanent-residence,2,2.html
panel_emigration <- panel_emigration %>%
  mutate(EU_emigration_flow = if_else(ess_year == 2006 & cntry == "PL", 40618,
                                      EU_emigration_flow))

# Switzerland 2008: 
panel_emigration <- panel_emigration %>%
  mutate(EU_emigration_flow = if_else(ess_year == 2008 & cntry == "CH", 24552,
                                      EU_emigration_flow))

eu_migration <- merge(panel_immigration, panel_emigration, by = c("ess_year", "cntry"))

#Now calculate net migration! 
eu_migration$EU_net_migration <- (eu_migration$EU_immigration_flow) - (eu_migration$EU_emigration_flow)


# Now calculate net migration for the previous four years by taking the summary 

eu_migration <- eu_migration %>%
  group_by(cntry) %>%
  arrange(ess_year) %>%
  mutate(
    EU_immigration_cumulative_4yr = as.numeric(rollapply(EU_immigration_flow, 
                                                         width = 4, FUN = sum, 
                                                         na.rm = TRUE,
                                                         fill = NA, align = "right")),
    EU_emigration_cumulative_4yr = as.numeric(rollapply(EU_emigration_flow, 
                                                        width = 4, FUN = sum, 
                                                        na.rm = TRUE,
                                                        fill = NA, align = "right")
  )) %>%
  ungroup()

#Now calculate net migration! 
eu_migration$national_net_migration_4yr <- (eu_migration$EU_immigration_cumulative_4yr) - (eu_migration$EU_emigration_cumulative_4yr)

# cumulative net-EU migration based on previous or next country of residence for 2008 and 2016 
graph3 <- eu_migration %>%
  filter(ess_year %in% c(2008, 2016))

custom_colors <- c("2008" = "#449777", "2016" = "#984464")

format_with_apostrophe <- function(x) {
  scales::label_number(big.mark = "'")(x)
}


plot3 <- ggplot(graph3, aes(x = national_net_migration_4yr, y = reorder(cntry, 
                                                                  national_net_migration_4yr), 
                            fill = factor(ess_year))) +
  geom_bar(stat = "identity", position = "stack") + 
  scale_fill_manual(values = custom_colors) +  # Use viridis for color-blind-friendly palette
  labs(
    title = "Cumulative Net EU Migration Trends by Country",
    x = "Cumulative Net EU Migration (t-4)",
    y = "Country",
    fill = "Year",
    caption = "Source: Eurostat (2024)" 
  ) + 
  geom_vline(xintercept = 0, color = "black", size = 0.5) +
  theme_minimal(base_size = 14) +  
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),  
    axis.text.y = element_text(size = 12),  
    legend.position = "right",  
    legend.title = element_text(size = 12),  
    legend.text = element_text(size = 10),  
    plot.title = element_text(face = "bold", size = 14),  
    plot.caption = element_text(hjust = 0.5, size = 10),  
    panel.grid.y = element_blank()  
  ) + 
  scale_x_continuous(
    breaks = seq(-500000, 1500000, by = 250000),  
    labels = format_with_apostrophe  
  )

# display the plot 
plot3

# net-EU migration based on previous or next country of residence for 2008-2019 
graph4 <- eu_migration %>%
  filter(ess_year >= 2004 & ess_year <= 2020)

result <- graph4 %>%
  mutate(period = case_when(
    ess_year >= 2004 & ess_year <= 2008 ~ "2004-2008",
    ess_year >= 2009 & ess_year <= 2012 ~ "2009-2012",
    ess_year >= 2013 & ess_year <= 2016 ~ "2013-2016",
    ess_year >= 2017 & ess_year <= 2020 ~ "2017-2020"
  )) %>%
  group_by(cntry, period) %>%
  summarise(sum_EU_net_migration = sum(EU_net_migration, na.rm = TRUE)) %>%
  arrange(cntry, period)

# specify custom colours 
custom_colors <- c("2004-2008" = "#974464", "2009-2012" = "#cf9cac", 
                   "2013-2016" = "#95c2ad", "2017-2020" = "#449777")

plot4 <- ggplot(result, aes(x = sum_EU_net_migration, 
                            y = reorder(cntry, sum_EU_net_migration), 
                            fill = factor(period))) +
  geom_bar(stat = "identity", position = "dodge") + 
  scale_fill_manual(values = custom_colors) +  
  labs(
    title = "Cumulative Net EU Migration by Country",
    x = "Cumulative Net EU Migration",
    y = "Country",
    fill = "Period",
    caption = "Source: Eurostat (2024)"
  ) + 
  geom_vline(xintercept = 0, color = "black", size = 0.5) +
  theme_minimal(base_size = 14) + 
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),  
    axis.text.y = element_text(size = 12),  
    legend.position = "right",  
    legend.title = element_text(size = 12), 
    legend.text = element_text(size = 10),  
    plot.title = element_text(face = "bold", size = 14)  
  ) +  
  coord_flip() + 
  scale_x_continuous(
    breaks = seq(-500000, 1500000, by = 250000),  
    labels = format_with_apostrophe  
  )

# display the plot 
plot4

# Create a graph for the top 5 receiving states 
graph5 <- eu_migration %>%
  filter(cntry %in% c("DE", "GB", "CH", "AT", "FR")) %>% 
  filter(ess_year >= 2008 & ess_year <= 2019) 

# specify custom colours 
custom_colors <- c(
  "DE" = "#449777",  
  "GB" = "#927951",  
  "CH" = "#8e9744",  
  "AT" = "#4d4497",  
  "FR" = "#984464"
)

# create the plot 
plot5 <- ggplot(graph5, aes(x = ess_year, y = EU_immigration_flow, 
                            group = cntry, color = reorder(cntry, -EU_immigration_flow))) +
  geom_line(size = 1) +  # Create the line graph
  geom_point(size = 2) +  # Add points to highlight each data point
  scale_color_manual(values = custom_colors) +  # Use custom colors
  labs(
    title = "Intra-EU Immigration Flow for Main Destination Countries",
    x = "Year",
    caption = "Source: Eurostat (2024)", 
    y = "Intra EU-Immigration Flow",
    color = "Country"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),  
    axis.text.y = element_text(size = 12),  
    legend.position = "right",  
    legend.title = element_text(size = 12),  
    legend.text = element_text(size = 10),  
    plot.title = element_text(face = "bold", size = 14), 
    panel.grid.major.x = element_blank(),  
    panel.grid.minor.x = element_blank()   
  ) +
  scale_y_continuous(
    labels = format_with_apostrophe,  
    limits = c(0, NA) 
  ) +
  scale_x_continuous(breaks = seq(2008, 2019, by = 1))  

# display the plot 
plot5

# migration flows of main receiving countries 
graph6 <- eu_migration %>%
  filter(cntry %in% c("RO", "PL", "LT", "PT", "LV", "ES")) %>% 
  filter(ess_year >= 2008 & ess_year <= 2019) 

custom_colors <- c(
  "RO" = "#449777",  
  "PL" = "#927951",  
  "LT" = "#8e9744",  
  "PT" = "#8ebbd7",  
  "LV" = "#984464", 
  "ES" = "#4d4497"
)

plot6 <- ggplot(graph6, aes(x = ess_year, y = EU_emigration_flow, 
                            group = cntry, color = reorder(cntry, -EU_emigration_flow))) +
  geom_line(size = 1) + 
  geom_point(size = 2) +  
  scale_color_manual(values = custom_colors) +  
  labs(
    title = "Intra-EU Emigration Flow for Main Sending Countries",
    x = "Year",
    y = "Intra EU-emigration flow",
    color = "Country"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),  
    axis.text.y = element_text(size = 12),  
    legend.position = "right",  
    legend.title = element_text(size = 12), 
    legend.text = element_text(size = 10),  
    plot.title = element_text(face = "bold", size = 14),  
    panel.grid.major.x = element_blank(),  
    panel.grid.minor.x = element_blank()  
  ) +
  scale_y_continuous(
    labels = format_with_apostrophe,  
    limits = c(0, NA) 
  ) + 
  scale_x_continuous(breaks = seq(2008, 2019, by = 1))  

# display the plot 
plot6

9.12 Marginal Effects of EU Immigration on Public Welfare State Support - Model 1

This code creates a plot to visualise the predicted impact of EU immigration on welfare state support, as derived from the first hierarchical linear regression model. The graph highlights the relationship and trends indicated by the model’s predictions.

# Generate predicted values
predicted_values <- ggpredict(model1, 
                              terms = "log_EU_immigration_cumulative_4yr")

# Generate the plot 
plot_model1 <- ggplot(predicted_values, aes(x = x, 
                                            y = predicted, 
                                            ymin = conf.low, 
                                            ymax = conf.high)) +
  geom_ribbon(alpha = 0.2, fill = "#449777") +  # Confidence interval shading
  geom_line(color = "#449777", size = 1) +      # Line for predicted values
  labs(
    title = "The Impact of EU Immigration on Predicted Support for the Welfare State",
    x = "Log of EU Immigration (Cumulative 4 Years)",
    y = "Predicted Support for the Welfare State"
  ) +
  theme_minimal(base_size = 15) +  # Minimal theme with base size
  theme(
    plot.title = element_text(face = "bold"),  # Make title bold
    axis.text = element_text(size = 12),  # Set axis text size
    axis.title = element_text(size = 14)  # Set axis title size
  ) +  
  scale_x_continuous(breaks = scales::pretty_breaks(n = 10)) +  # Improve x-axis breaks
  scale_y_continuous(breaks = scales::pretty_breaks(n = 10))  # Improve y-axis breaks

# Display the plot
print(plot_model1) 

9.13 Marginal Effects of EU Emigration on Public Welfare State Support - Model 2

This code generates a plot to visualise the predicted impact of EU emigration on welfare state support, based on the results of the second hierarchical linear regression model. The plot highlights the trends and relationships indicated by the model’s predictions.

# Generate predicted values
predicted_values <- ggpredict(model2, terms = "log_EU_emigration_cumulative_4yr [8, 8.5, 9, 9.5, 10, 10.5, 11, 11.5, 12, 12.5, 13]")

# Plot the predicted values without points and error bars
plot_model2 <- ggplot(predicted_values, aes(x = x, 
                                            y = predicted, ymin = conf.low, 
                                            ymax = conf.high)) +
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2, 
              fill = "#449777") +  # 
  geom_line(color = "#449777", size = 1) +  
  labs(
    title = "The Impact of EU Emigration on Predicted Support for the Welfare State",
    x = "Log of EU Emigration (Cumulative 4 Years)",
    y = "Predicted Support for the Welfare State"
  ) +
  theme_minimal(base_size = 15) +  
  theme(
    plot.title = element_text(face = "bold"),  
    axis.text = element_text(size = 12),  
    axis.title = element_text(size = 14) 
  ) +
  scale_x_continuous(
    breaks = c(8, 8.5, 9, 9.5, 10, 10.5, 11, 11.5, 12, 12.5, 13),
    limits = c(8, 13) 
  ) +
  scale_y_continuous(breaks = scales::pretty_breaks(n = 10))  

# Display the plot
print(plot_model2)

9.14 Marginal Effects of Regional EU Immigration on Public Welfare State Support

This code calculates predicted welfare state support as a function of the share of EU-born immigrants in different regions, using the model1_regional regression model. It then generates a plot that includes a predicted trend line along with a confidence interval. The x-axis represents the share of EU-born immigrants, while the y-axis displays the predicted level of welfare state support, offering a clear visual representation of the model’s findings.

# Generate predicted values bases on model1_regional
predicted_values <- ggpredict(model1_regional, 
                              terms = "regional_foreign_born_EU_share [all]")

# Create the plot
plot_regional_immigration <- ggplot(predicted_values, aes(x = x, 
                                                          y = predicted, 
                                                          ymin = conf.low, 
                                                          ymax = conf.high)) +
  geom_ribbon(alpha = 0.2, fill = "#449777") +  # Confidence interval
  geom_line(color = "#449777", size = 1) +      # Predicted line
  labs(
    title = "Predicted Welfare State Support by Regional Share of EU-born Immigrants",
    x = "Share of EU-born Immigrants (%) by NUTS 2 Regions",
    y = "Predicted Welfare State Support"
  ) +
  theme_minimal(base_size = 15) + 
  theme(
    plot.title = element_text(face = "bold"), 
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 14)) + 
  scale_x_continuous(breaks = seq(0, 0.35, by = 0.05), 
                     limits = c(0, 0.35))
  
# Show the plot
plot_regional_immigration 

9.15 Marginal Effects of Regional Emigration on Public Welfare State Support in the EU

This code generates a plot to illustrate the effect of regional emigration on predicted welfare state support, using predictions calculated from the model2_regional regression model. The predictions are based on cumulative emigration data, with the plot visually depicting the relationship and trends identified by the model.

# Generate predicted values 
predicted_values_emigration <- ggpredict(model2_regional, 
                                         terms = "logemigration_extent_cumulative_4yr [all]")

# Create the plot with custom x-axis breaks and limits
plot_regional_emigration <- ggplot(predicted_values_emigration, 
                                   aes(x = x, 
                                       y = predicted, 
                                       ymin = conf.low, 
                                       ymax = conf.high)) +
  geom_ribbon(alpha = 0.2, fill = "#449777") +  # Confidence interval
  geom_line(color = "#449777", size = 1) +      # Predicted line
  labs(
    title = "Predicted Welfare State Support by Regional Emigration",
    x = "Log of Emigration (Cumulative 4 Years) by NUTS 2 Regions",
    y = "Predicted Welfare State Support"
  ) + 
  theme_minimal(base_size = 15) +  
  theme(
    plot.title = element_text(face = "bold"), 
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 14)
  )

# Show the plot 
plot_regional_emigration